Open Tizzzzy opened 2 months ago
Hi, there. This is an issue due to the change in Git API. Run git pull metaicl main --allow-unrelated-histories -X ours
instead will solve the problem. Please let us know if you have more questions. Thanks!
Hi,
Now, I am trying to run this command python test.py --dataset glue-mrpc --gpt2 channel-metaicl --method channel --out_dir out/channel-metaicl --do_zeroshot --use_demonstrations --k 4 --seed 100,13,21,42,87
. However, it gives me this error:
07/25/2024 11:09:42 - INFO - __main__ - out/channel-metaicl/glue-mrpc-test-channel-k=4-s=100.pkl
07/25/2024 11:09:42 - INFO - __main__ - Reusing checkpoint at checkpoints/channel-metaicl/hr_to_lr
[Already exists] Skipping checkpoints/channel-metaicl/hr_to_lr
If you want to download the file in another location, please specify a different path
07/25/2024 11:09:42 - INFO - __main__ - Loading the model from checkpoints/channel-metaicl/hr_to_lr
/research/cbim/medical/lh599/research/ruijiang/Dong/demonstration_selection/rethinking-demonstrations/metaicl/model.py:116: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
state_dict = torch.load(checkpoint)
07/25/2024 11:09:48 - INFO - __main__ - torch.Size([816, 1024])
Traceback (most recent call last):
File "test.py", line 274, in <module>
main(logger, args)
File "test.py", line 138, in main
result = run(logger, test_task, metaicl_data, metaicl_model,
File "test.py", line 202, in run
losses = metaicl_model.do_inference(metaicl_data, args.test_batch_size)
File "/research/cbim/medical/lh599/research/ruijiang/Dong/demonstration_selection/rethinking-demonstrations/metaicl/model.py", line 256, in do_inference
loss = self.run_model(input_ids, attention_mask, token_type_ids, labels=labels)
File "/research/cbim/medical/lh599/research/ruijiang/Dong/demonstration_selection/rethinking-demonstrations/metaicl/model.py", line 274, in run_model
outputs = self.model(input_ids=input_ids, attention_mask=attention_mask)
File "/research/cbim/medical/lh599/research/ruijiang/miniconda/envs/metaicl2/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/research/cbim/medical/lh599/research/ruijiang/miniconda/envs/metaicl2/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
return forward_call(*args, **kwargs)
File "/research/cbim/medical/lh599/research/ruijiang/miniconda/envs/metaicl2/lib/python3.8/site-packages/transformers/models/gpt2/modeling_gpt2.py", line 949, in forward
transformer_outputs = self.transformer(
File "/research/cbim/medical/lh599/research/ruijiang/miniconda/envs/metaicl2/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/research/cbim/medical/lh599/research/ruijiang/miniconda/envs/metaicl2/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
return forward_call(*args, **kwargs)
File "/research/cbim/medical/lh599/research/ruijiang/miniconda/envs/metaicl2/lib/python3.8/site-packages/transformers/models/gpt2/modeling_gpt2.py", line 793, in forward
outputs = block(
File "/research/cbim/medical/lh599/research/ruijiang/miniconda/envs/metaicl2/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/research/cbim/medical/lh599/research/ruijiang/miniconda/envs/metaicl2/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
return forward_call(*args, **kwargs)
File "/research/cbim/medical/lh599/research/ruijiang/miniconda/envs/metaicl2/lib/python3.8/site-packages/transformers/models/gpt2/modeling_gpt2.py", line 318, in forward
attn_outputs = self.attn(
File "/research/cbim/medical/lh599/research/ruijiang/miniconda/envs/metaicl2/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/research/cbim/medical/lh599/research/ruijiang/miniconda/envs/metaicl2/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
return forward_call(*args, **kwargs)
File "/research/cbim/medical/lh599/research/ruijiang/miniconda/envs/metaicl2/lib/python3.8/site-packages/transformers/models/gpt2/modeling_gpt2.py", line 259, in forward
attn_output, attn_weights = self._attn(query, key, value, attention_mask, head_mask)
File "/research/cbim/medical/lh599/research/ruijiang/miniconda/envs/metaicl2/lib/python3.8/site-packages/transformers/models/gpt2/modeling_gpt2.py", line 182, in _attn
attn_weights = attn_weights / (float(value.size(-1)) ** 0.5)
torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 5.00 GiB. GPU 0 has a total capacity of 44.55 GiB of which 4.92 GiB is free. Including non-PyTorch memory, this process has 39.62 GiB memory in use. Of the allocated memory 34.25 GiB is allocated by PyTorch, and 5.06 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)
And here is my gpu usage, and it should have enough memory:
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.161.08 Driver Version: 535.161.08 CUDA Version: 12.2 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA RTX A6000 On | 00000000:01:00.0 Off | 0 |
| 30% 28C P8 18W / 300W | 1MiB / 46068MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
I followed every step in the README, except I downloaded pytorch using this command conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia
, and I have torch version 2.4.0
. Because I got this error if I download torch using this command pip install torch==1.9.0
07/24/2024 17:24:36 - INFO - __main__ - Loading the model from checkpoints/channel-metaicl/hr_to_lr
Downloading: 100%|███████████████████████████████████████████████████████████████████████████████████| 666/666 [00:00<00:00, 3.75MB/s]
Downloading: 100%|███████████████████████████████████████████████████████████████████████████████| 3.02G/3.02G [03:59<00:00, 13.5MB/s]
/research/cbim/medical/lh599/research/ruijiang/miniconda/envs/metaicl2/lib/python3.8/site-packages/torch/cuda/__init__.py:106: UserWarning:
NVIDIA RTX A6000 with CUDA capability sm_86 is not compatible with the current PyTorch installation.
The current PyTorch install supports CUDA capabilities sm_37 sm_50 sm_60 sm_70.
If you want to use the NVIDIA RTX A6000 GPU with PyTorch, please check the instructions at https://pytorch.org/get-started/locally/
warnings.warn(incompatible_device_warn.format(device_name, capability, " ".join(arch_list), device_name))
07/24/2024 17:28:45 - INFO - __main__ - torch.Size([816, 1024])
Traceback (most recent call last):
File "test.py", line 274, in <module>
main(logger, args)
File "test.py", line 138, in main
result = run(logger, test_task, metaicl_data, metaicl_model,
File "test.py", line 202, in run
losses = metaicl_model.do_inference(metaicl_data, args.test_batch_size)
File "/research/cbim/medical/lh599/research/ruijiang/Dong/demonstration_selection/rethinking-demonstrations/metaicl/model.py", line 256, in do_inference
loss = self.run_model(input_ids, attention_mask, token_type_ids, labels=labels)
File "/research/cbim/medical/lh599/research/ruijiang/Dong/demonstration_selection/rethinking-demonstrations/metaicl/model.py", line 274, in run_model
outputs = self.model(input_ids=input_ids, attention_mask=attention_mask)
File "/research/cbim/medical/lh599/research/ruijiang/miniconda/envs/metaicl2/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/research/cbim/medical/lh599/research/ruijiang/miniconda/envs/metaicl2/lib/python3.8/site-packages/transformers/models/gpt2/modeling_gpt2.py", line 949, in forward
transformer_outputs = self.transformer(
File "/research/cbim/medical/lh599/research/ruijiang/miniconda/envs/metaicl2/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/research/cbim/medical/lh599/research/ruijiang/miniconda/envs/metaicl2/lib/python3.8/site-packages/transformers/models/gpt2/modeling_gpt2.py", line 694, in forward
position_ids = torch.arange(past_length, input_shape[-1] + past_length, dtype=torch.long, device=device)
RuntimeError: CUDA error: no kernel image is available for execution on the device
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Try reducing the batch size as recommended here.
Hi,
I am trying to run the code in vast.ai
. However, when I pull the metaicl main
using git pull metaicl main --allow-unrelated-histories -X ours
, I got the following error:
root@C.11695639:/rethinking-demonstrations$ git remote add metaicl https://github.com/facebookresearch/MetaICL.git
root@C.11695639:/rethinking-demonstrations$ git pull metaicl main --allow-unrelated-histories -X ours
remote: Enumerating objects: 480, done.
remote: Counting objects: 100% (233/233), done.
remote: Compressing objects: 100% (109/109), done.
remote: Total 480 (delta 219), reused 124 (delta 124), pack-reused 247
Receiving objects: 100% (480/480), 485.62 KiB | 2.10 MiB/s, done.
Resolving deltas: 100% (308/308), done.
From https://github.com/facebookresearch/MetaICL
* branch main -> FETCH_HEAD
* [new branch] main -> metaicl/main
hint: You have divergent branches and need to specify how to reconcile them.
hint: You can do so by running one of the following commands sometime before
hint: your next pull:
hint:
hint: git config pull.rebase false # merge (the default strategy)
hint: git config pull.rebase true # rebase
hint: git config pull.ff only # fast-forward only
hint:
hint: You can replace "git config" with "git config --global" to set a default
hint: preference for all repositories. You can also pass --rebase, --no-rebase,
hint: or --ff-only on the command line to override the configured default per
hint: invocation.
fatal: Need to specify how to reconcile divergent branches.
Hi, I am at the Preparation stage of this repo. I did git clone this repo, and then cd to the repo, then I did
git remote add metaicl https://github.com/facebookresearch/MetaICL.git
. Then when I rungit pull metaicl main
I got this:Can you please take a look?