Alrope123 / rethinking-demonstrations

166 stars 16 forks source link

Unable to pull metaicl main #11

Open Tizzzzy opened 2 months ago

Tizzzzy commented 2 months ago

Hi, I am at the Preparation stage of this repo. I did git clone this repo, and then cd to the repo, then I did git remote add metaicl https://github.com/facebookresearch/MetaICL.git. Then when I run git pull metaicl main I got this:

[conda] (base) [lh599@corfu:rethinking-demonstrations]$ git pull metaicl main
warning: no common commits
remote: Enumerating objects: 480, done.
remote: Counting objects: 100% (233/233), done.
remote: Compressing objects: 100% (109/109), done.
remote: Total 480 (delta 219), reused 124 (delta 124), pack-reused 247
Receiving objects: 100% (480/480), 485.62 KiB | 39.00 KiB/s, done.
Resolving deltas: 100% (308/308), done.
From https://github.com/facebookresearch/MetaICL
 * branch            main       -> FETCH_HEAD
 * [new branch]      main       -> metaicl/main
fatal: refusing to merge unrelated histories

Can you please take a look?

Alrope123 commented 2 months ago

Hi, there. This is an issue due to the change in Git API. Run git pull metaicl main --allow-unrelated-histories -X ours instead will solve the problem. Please let us know if you have more questions. Thanks!

Tizzzzy commented 2 months ago

Hi, Now, I am trying to run this command python test.py --dataset glue-mrpc --gpt2 channel-metaicl --method channel --out_dir out/channel-metaicl --do_zeroshot --use_demonstrations --k 4 --seed 100,13,21,42,87. However, it gives me this error:

07/25/2024 11:09:42 - INFO - __main__ - out/channel-metaicl/glue-mrpc-test-channel-k=4-s=100.pkl
07/25/2024 11:09:42 - INFO - __main__ - Reusing checkpoint at checkpoints/channel-metaicl/hr_to_lr
[Already exists] Skipping checkpoints/channel-metaicl/hr_to_lr
If you want to download the file in another location, please specify a different path
07/25/2024 11:09:42 - INFO - __main__ - Loading the model from checkpoints/channel-metaicl/hr_to_lr
/research/cbim/medical/lh599/research/ruijiang/Dong/demonstration_selection/rethinking-demonstrations/metaicl/model.py:116: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
  state_dict = torch.load(checkpoint)
07/25/2024 11:09:48 - INFO - __main__ - torch.Size([816, 1024])
Traceback (most recent call last):
  File "test.py", line 274, in <module>
    main(logger, args)
  File "test.py", line 138, in main
    result = run(logger, test_task, metaicl_data, metaicl_model,
  File "test.py", line 202, in run
    losses = metaicl_model.do_inference(metaicl_data, args.test_batch_size)
  File "/research/cbim/medical/lh599/research/ruijiang/Dong/demonstration_selection/rethinking-demonstrations/metaicl/model.py", line 256, in do_inference
    loss = self.run_model(input_ids, attention_mask, token_type_ids, labels=labels)
  File "/research/cbim/medical/lh599/research/ruijiang/Dong/demonstration_selection/rethinking-demonstrations/metaicl/model.py", line 274, in run_model
    outputs = self.model(input_ids=input_ids, attention_mask=attention_mask)
  File "/research/cbim/medical/lh599/research/ruijiang/miniconda/envs/metaicl2/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/research/cbim/medical/lh599/research/ruijiang/miniconda/envs/metaicl2/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
  File "/research/cbim/medical/lh599/research/ruijiang/miniconda/envs/metaicl2/lib/python3.8/site-packages/transformers/models/gpt2/modeling_gpt2.py", line 949, in forward
    transformer_outputs = self.transformer(
  File "/research/cbim/medical/lh599/research/ruijiang/miniconda/envs/metaicl2/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/research/cbim/medical/lh599/research/ruijiang/miniconda/envs/metaicl2/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
  File "/research/cbim/medical/lh599/research/ruijiang/miniconda/envs/metaicl2/lib/python3.8/site-packages/transformers/models/gpt2/modeling_gpt2.py", line 793, in forward
    outputs = block(
  File "/research/cbim/medical/lh599/research/ruijiang/miniconda/envs/metaicl2/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/research/cbim/medical/lh599/research/ruijiang/miniconda/envs/metaicl2/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
  File "/research/cbim/medical/lh599/research/ruijiang/miniconda/envs/metaicl2/lib/python3.8/site-packages/transformers/models/gpt2/modeling_gpt2.py", line 318, in forward
    attn_outputs = self.attn(
  File "/research/cbim/medical/lh599/research/ruijiang/miniconda/envs/metaicl2/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/research/cbim/medical/lh599/research/ruijiang/miniconda/envs/metaicl2/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
  File "/research/cbim/medical/lh599/research/ruijiang/miniconda/envs/metaicl2/lib/python3.8/site-packages/transformers/models/gpt2/modeling_gpt2.py", line 259, in forward
    attn_output, attn_weights = self._attn(query, key, value, attention_mask, head_mask)
  File "/research/cbim/medical/lh599/research/ruijiang/miniconda/envs/metaicl2/lib/python3.8/site-packages/transformers/models/gpt2/modeling_gpt2.py", line 182, in _attn
    attn_weights = attn_weights / (float(value.size(-1)) ** 0.5)
torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 5.00 GiB. GPU 0 has a total capacity of 44.55 GiB of which 4.92 GiB is free. Including non-PyTorch memory, this process has 39.62 GiB memory in use. Of the allocated memory 34.25 GiB is allocated by PyTorch, and 5.06 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

And here is my gpu usage, and it should have enough memory:

+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.161.08             Driver Version: 535.161.08   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA RTX A6000               On  | 00000000:01:00.0 Off |                    0 |
| 30%   28C    P8              18W / 300W |      1MiB / 46068MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

I followed every step in the README, except I downloaded pytorch using this command conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia, and I have torch version 2.4.0. Because I got this error if I download torch using this command pip install torch==1.9.0

07/24/2024 17:24:36 - INFO - __main__ - Loading the model from checkpoints/channel-metaicl/hr_to_lr
Downloading: 100%|███████████████████████████████████████████████████████████████████████████████████| 666/666 [00:00<00:00, 3.75MB/s]
Downloading: 100%|███████████████████████████████████████████████████████████████████████████████| 3.02G/3.02G [03:59<00:00, 13.5MB/s]
/research/cbim/medical/lh599/research/ruijiang/miniconda/envs/metaicl2/lib/python3.8/site-packages/torch/cuda/__init__.py:106: UserWarning:
NVIDIA RTX A6000 with CUDA capability sm_86 is not compatible with the current PyTorch installation.
The current PyTorch install supports CUDA capabilities sm_37 sm_50 sm_60 sm_70.
If you want to use the NVIDIA RTX A6000 GPU with PyTorch, please check the instructions at https://pytorch.org/get-started/locally/

  warnings.warn(incompatible_device_warn.format(device_name, capability, " ".join(arch_list), device_name))
07/24/2024 17:28:45 - INFO - __main__ - torch.Size([816, 1024])
Traceback (most recent call last):
  File "test.py", line 274, in <module>
    main(logger, args)
  File "test.py", line 138, in main
    result = run(logger, test_task, metaicl_data, metaicl_model,
  File "test.py", line 202, in run
    losses = metaicl_model.do_inference(metaicl_data, args.test_batch_size)
  File "/research/cbim/medical/lh599/research/ruijiang/Dong/demonstration_selection/rethinking-demonstrations/metaicl/model.py", line 256, in do_inference
    loss = self.run_model(input_ids, attention_mask, token_type_ids, labels=labels)
  File "/research/cbim/medical/lh599/research/ruijiang/Dong/demonstration_selection/rethinking-demonstrations/metaicl/model.py", line 274, in run_model
    outputs = self.model(input_ids=input_ids, attention_mask=attention_mask)
  File "/research/cbim/medical/lh599/research/ruijiang/miniconda/envs/metaicl2/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/research/cbim/medical/lh599/research/ruijiang/miniconda/envs/metaicl2/lib/python3.8/site-packages/transformers/models/gpt2/modeling_gpt2.py", line 949, in forward
    transformer_outputs = self.transformer(
  File "/research/cbim/medical/lh599/research/ruijiang/miniconda/envs/metaicl2/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/research/cbim/medical/lh599/research/ruijiang/miniconda/envs/metaicl2/lib/python3.8/site-packages/transformers/models/gpt2/modeling_gpt2.py", line 694, in forward
    position_ids = torch.arange(past_length, input_shape[-1] + past_length, dtype=torch.long, device=device)
RuntimeError: CUDA error: no kernel image is available for execution on the device
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Alrope123 commented 2 months ago

Try reducing the batch size as recommended here.

Tizzzzy commented 2 months ago

Hi, I am trying to run the code in vast.ai. However, when I pull the metaicl main using git pull metaicl main --allow-unrelated-histories -X ours, I got the following error:

root@C.11695639:/rethinking-demonstrations$ git remote add metaicl https://github.com/facebookresearch/MetaICL.git
root@C.11695639:/rethinking-demonstrations$ git pull metaicl main --allow-unrelated-histories -X ours
remote: Enumerating objects: 480, done.
remote: Counting objects: 100% (233/233), done.
remote: Compressing objects: 100% (109/109), done.
remote: Total 480 (delta 219), reused 124 (delta 124), pack-reused 247
Receiving objects: 100% (480/480), 485.62 KiB | 2.10 MiB/s, done.
Resolving deltas: 100% (308/308), done.
From https://github.com/facebookresearch/MetaICL
 * branch            main       -> FETCH_HEAD
 * [new branch]      main       -> metaicl/main
hint: You have divergent branches and need to specify how to reconcile them.
hint: You can do so by running one of the following commands sometime before
hint: your next pull:
hint: 
hint:   git config pull.rebase false  # merge (the default strategy)
hint:   git config pull.rebase true   # rebase
hint:   git config pull.ff only       # fast-forward only
hint: 
hint: You can replace "git config" with "git config --global" to set a default
hint: preference for all repositories. You can also pass --rebase, --no-rebase,
hint: or --ff-only on the command line to override the configured default per
hint: invocation.
fatal: Need to specify how to reconcile divergent branches.