Hi authors, I do not have access to solid hardware. What I have for now is two 3090s (24GB each). I am planning I run/debug the code with this setup and then move experiments to A100s. On these two 3090s, I have CUDA 12.4. Torch version 2.0.0 (in the requirements) does not support CUDA 12.X. I found that torch 12.1.1 support CUDA 12.1. This is the only change I have made in the requirements, otherwise the setup is as suggested.
I am the code is getting stuck at the following progress step,
LlamaTokenizerFast(name_or_path='meta-llama/Llama-2-7b-hf', vocab_size=32000, model_max_length=32, is_fast=True, padding_side='right', truncation
_side='left', special_tokens={'bos_token': '<s>', 'eos_token': '</s>', 'unk_token': '<unk>', 'pad_token': '</s>'}, clean_up_tokenization_spaces=F
alse), added_tokens_decoder={
0: AddedToken("<unk>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
1: AddedToken("<s>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
2: AddedToken("</s>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
}
/mnt/shared_home/vdeshpande/miniconda3/envs/env_spag/lib/python3.9/site-packages/accelerate/accelerator.py:457: FutureWarning: Passing the follow
ing arguments to `Accelerator` is deprecated and will be removed in version 1.0 of Accelerate: dict_keys(['dispatch_batches', 'split_batches']).
Please pass an `accelerate.DataLoaderConfiguration` instead:
dataloader_config = DataLoaderConfiguration(dispatch_batches=None, split_batches=False)
warnings.warn(
Installed CUDA version 12.2 does not match the version torch was compiled with 12.1 but since the APIs are compatible, accepting this combination
Using /mnt/shared_home/vdeshpande/.cache/torch_extensions/py39_cu121 as PyTorch extensions root...
Installed CUDA version 12.2 does not match the version torch was compiled with 12.1 but since the APIs are compatible, accepting this combination
Using /mnt/shared_home/vdeshpande/.cache/torch_extensions/py39_cu121 as PyTorch extensions root...
Detected CUDA files, patching ldflags
Emitting ninja build file /mnt/shared_home/vdeshpande/.cache/torch_extensions/py39_cu121/cpu_adam/build.ninja...
Building extension module cpu_adam...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
ninja: no work to do.
Loading extension module cpu_adam...
Loading extension module cpu_adam...
Time to load cpu_adam op: 3.3634016513824463 seconds
Time to load cpu_adam op: 3.0814285278320312 seconds
Parameter Offload: Total persistent parameters: 532480 in 130 params [2024-09-04 15:34:07,215] torch.distributed.elastic.multiprocessing.api: [WARNING] Sending process 50337 closing signal SIGTERM
[2024-09-04 15:34:22,297] torch.distributed.elastic.multiprocessing.api: [ERROR] failed (exitcode: -9) local_rank: 0 (pid: 50336) of binary: /mnt
/shared_home/vdeshpande/miniconda3/envs/env_spag/bin/python
Hi authors, I do not have access to solid hardware. What I have for now is two 3090s (24GB each). I am planning I run/debug the code with this setup and then move experiments to A100s. On these two 3090s, I have CUDA 12.4. Torch version 2.0.0 (in the requirements) does not support CUDA 12.X. I found that torch 12.1.1 support CUDA 12.1. This is the only change I have made in the requirements, otherwise the setup is as suggested.
When I run
I am the code is getting stuck at the following progress step,
Any insights on resolving this issue?