cuda版本:11.4
nvcc --version:
nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2021 NVIDIA Corporation Built on Sun_Aug_15_21:14:11_PDT_2021 Cuda compilation tools, release 11.4, V11.4.120 Build cuda_11.4.r11.4/compiler.30300941_0
没有修改过代码
输出结果:
输出过程中无任何错误
`WARNING:torch.distributed.run:
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
CUDA SETUP: CUDA runtime path found: ................../.conda/envs/py38/lib/libcudart.so
CUDA SETUP: Highest compute capability among GPUs detected: 8.6
CUDA SETUP: Detected CUDA version 118
CUDA SETUP: Loading binary ................../.conda/envs/py38/lib/python3.8/site-packages/bitsandbytes/libbitsandbytes_cuda118.so...
CUDA SETUP: CUDA runtime path found: ................../.conda/envs/py38/lib/libcudart.so
CUDA SETUP: Highest compute capability among GPUs detected: 8.6
CUDA SETUP: Detected CUDA version 118
CUDA SETUP: Loading binary ................../.conda/envs/py38/lib/python3.8/site-packages/bitsandbytes/libbitsandbytes_cuda118.so...
decapoda-research/llama-7b-hf
decapoda-research/llama-7b-hf
Overriding torch_dtype=None with torch_dtype=torch.float16 due to requirements of bitsandbytes to enable model loading in mixed int8. Either pass torch_dtype=torch.float16 or don't pass this argument at all to remove this warning.
Overriding torch_dtype=None with torch_dtype=torch.float16 due to requirements of bitsandbytes to enable model loading in mixed int8. Either pass torch_dtype=torch.float16 or don't pass this argument at all to remove this warning.
Loading checkpoint shards: 100%|██████████| 33/33 [00:16<00:00, 2.32it/s]
Loading checkpoint shards: 100%|██████████| 33/33 [00:16<00:00, 2.02it/s]
The tokenizer class you load from this checkpoint is not the same type as the class this function is called from. It may result in unexpected tokenization.
The tokenizer class you load from this checkpoint is 'LLaMATokenizer'.
The class this function is called from is 'LlamaTokenizer'.
The tokenizer class you load from this checkpoint is not the same type as the class this function is called from. It may result in unexpected tokenization.
The tokenizer class you load from this checkpoint is 'LLaMATokenizer'.
The class this function is called from is 'LlamaTokenizer'.
Found cached dataset json (................../.cache/huggingface/datasets/json/default-c92b5414d77c27a9/0.0.0/fe5dd6ea2639a6df622901539cb550cf8797e5a6b2dd7af1cf934bed8e233e6e)
If there's a warning about missing keys above, please disregard :)
If there's a warning about missing keys above, please disregard :)
................../.conda/envs/py38/lib/python3.8/site-packages/transformers/optimization.py:391: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set no_deprecation_warning=True to disable this warning
warnings.warn(
................../.conda/envs/py38/lib/python3.8/site-packages/transformers/optimization.py:391: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set no_deprecation_warning=True to disable this warning
warnings.warn(
update:
单卡训练无任何报错,正常运行。多卡训练报[libstdc++.so](libstdc++.so.6: version `GLIBCXX_3.4.29' not found),加上module load gcc/9.3.0后就不报错了,但是程序很快就自己中断了。
集群上六张卡试着不同组合,都是同样的情况: 无任何报错,直接中断
操作系统: CentOS Linux 7 GPU: 3090,两张 python版本:3.8.16
库的版本和requirements.txt中基本一致 transformers和peft都是从git上拉过来装的 bitsandbytes 0.37.1 torch 1.13.1+cu116 torchaudio 0.13.1+cu116 torchsummary 1.5.1 torchtext 0.14.1 torchvision 0.14.1+cu116
llama-7b-hf数据提前下好了
cuda版本:11.4 nvcc --version: nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2021 NVIDIA Corporation Built on Sun_Aug_15_21:14:11_PDT_2021 Cuda compilation tools, release 11.4, V11.4.120 Build cuda_11.4.r11.4/compiler.30300941_0
没有修改过代码
输出结果: 输出过程中无任何错误
`WARNING:torch.distributed.run:
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
===================================BUG REPORT=================================== Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
===================================BUG REPORT=================================== Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
CUDA SETUP: CUDA runtime path found: ................../.conda/envs/py38/lib/libcudart.so CUDA SETUP: Highest compute capability among GPUs detected: 8.6 CUDA SETUP: Detected CUDA version 118 CUDA SETUP: Loading binary ................../.conda/envs/py38/lib/python3.8/site-packages/bitsandbytes/libbitsandbytes_cuda118.so... CUDA SETUP: CUDA runtime path found: ................../.conda/envs/py38/lib/libcudart.so CUDA SETUP: Highest compute capability among GPUs detected: 8.6 CUDA SETUP: Detected CUDA version 118 CUDA SETUP: Loading binary ................../.conda/envs/py38/lib/python3.8/site-packages/bitsandbytes/libbitsandbytes_cuda118.so... decapoda-research/llama-7b-hf decapoda-research/llama-7b-hf Overriding torch_dtype=None with
torch_dtype=torch.float16
due to requirements ofbitsandbytes
to enable model loading in mixed int8. Either pass torch_dtype=torch.float16 or don't pass this argument at all to remove this warning. Overriding torch_dtype=None withtorch_dtype=torch.float16
due to requirements ofbitsandbytes
to enable model loading in mixed int8. Either pass torch_dtype=torch.float16 or don't pass this argument at all to remove this warning.Loading checkpoint shards: 0%| | 0/33 [00:00<?, ?it/s] Loading checkpoint shards: 0%| | 0/33 [00:00<?, ?it/s] Loading checkpoint shards: 3%|▎ | 1/33 [00:02<01:30, 2.84s/it] Loading checkpoint shards: 3%|▎ | 1/33 [00:03<01:37, 3.04s/it] Loading checkpoint shards: 6%|▌ | 2/33 [00:03<00:44, 1.44s/it] Loading checkpoint shards: 6%|▌ | 2/33 [00:03<00:42, 1.37s/it] Loading checkpoint shards: 9%|▉ | 3/33 [00:03<00:28, 1.07it/s] Loading checkpoint shards: 9%|▉ | 3/33 [00:03<00:29, 1.03it/s] Loading checkpoint shards: 12%|█▏ | 4/33 [00:04<00:21, 1.33it/s] Loading checkpoint shards: 12%|█▏ | 4/33 [00:04<00:21, 1.37it/s] Loading checkpoint shards: 15%|█▌ | 5/33 [00:04<00:17, 1.60it/s] Loading checkpoint shards: 15%|█▌ | 5/33 [00:04<00:17, 1.57it/s] Loading checkpoint shards: 18%|█▊ | 6/33 [00:04<00:14, 1.82it/s] Loading checkpoint shards: 18%|█▊ | 6/33 [00:04<00:15, 1.79it/s] Loading checkpoint shards: 21%|██ | 7/33 [00:05<00:13, 1.98it/s] Loading checkpoint shards: 21%|██ | 7/33 [00:05<00:12, 2.00it/s] Loading checkpoint shards: 24%|██▍ | 8/33 [00:05<00:11, 2.11it/s] Loading checkpoint shards: 24%|██▍ | 8/33 [00:05<00:11, 2.13it/s] Loading checkpoint shards: 27%|██▋ | 9/33 [00:06<00:10, 2.21it/s] Loading checkpoint shards: 27%|██▋ | 9/33 [00:06<00:10, 2.22it/s] Loading checkpoint shards: 30%|███ | 10/33 [00:06<00:10, 2.24it/s] Loading checkpoint shards: 30%|███ | 10/33 [00:06<00:10, 2.23it/s] Loading checkpoint shards: 33%|███▎ | 11/33 [00:07<00:09, 2.33it/s] Loading checkpoint shards: 33%|███▎ | 11/33 [00:06<00:09, 2.33it/s] Loading checkpoint shards: 36%|███▋ | 12/33 [00:07<00:09, 2.31it/s] Loading checkpoint shards: 36%|███▋ | 12/33 [00:07<00:09, 2.30it/s] Loading checkpoint shards: 39%|███▉ | 13/33 [00:07<00:08, 2.35it/s] Loading checkpoint shards: 39%|███▉ | 13/33 [00:07<00:08, 2.34it/s] Loading checkpoint shards: 42%|████▏ | 14/33 [00:08<00:08, 2.33it/s] Loading checkpoint shards: 42%|████▏ | 14/33 [00:08<00:08, 2.32it/s] Loading checkpoint shards: 45%|████▌ | 15/33 [00:08<00:07, 2.34it/s] Loading checkpoint shards: 45%|████▌ | 15/33 [00:08<00:07, 2.34it/s] Loading checkpoint shards: 48%|████▊ | 16/33 [00:09<00:07, 2.35it/s] Loading checkpoint shards: 48%|████▊ | 16/33 [00:09<00:07, 2.36it/s] Loading checkpoint shards: 52%|█████▏ | 17/33 [00:09<00:06, 2.38it/s] Loading checkpoint shards: 52%|█████▏ | 17/33 [00:09<00:06, 2.35it/s] Loading checkpoint shards: 55%|█████▍ | 18/33 [00:09<00:06, 2.41it/s] Loading checkpoint shards: 55%|█████▍ | 18/33 [00:09<00:06, 2.41it/s] Loading checkpoint shards: 58%|█████▊ | 19/33 [00:10<00:05, 2.43it/s] Loading checkpoint shards: 58%|█████▊ | 19/33 [00:10<00:05, 2.43it/s] Loading checkpoint shards: 61%|██████ | 20/33 [00:10<00:05, 2.44it/s] Loading checkpoint shards: 61%|██████ | 20/33 [00:10<00:05, 2.44it/s] Loading checkpoint shards: 64%|██████▎ | 21/33 [00:11<00:04, 2.43it/s] Loading checkpoint shards: 64%|██████▎ | 21/33 [00:11<00:04, 2.43it/s] Loading checkpoint shards: 67%|██████▋ | 22/33 [00:11<00:04, 2.40it/s] Loading checkpoint shards: 67%|██████▋ | 22/33 [00:11<00:04, 2.40it/s] Loading checkpoint shards: 70%|██████▉ | 23/33 [00:11<00:04, 2.37it/s] Loading checkpoint shards: 70%|██████▉ | 23/33 [00:12<00:04, 2.37it/s] Loading checkpoint shards: 73%|███████▎ | 24/33 [00:12<00:03, 2.39it/s] Loading checkpoint shards: 73%|███████▎ | 24/33 [00:12<00:03, 2.39it/s] Loading checkpoint shards: 76%|███████▌ | 25/33 [00:12<00:03, 2.44it/s] Loading checkpoint shards: 76%|███████▌ | 25/33 [00:12<00:03, 2.44it/s] Loading checkpoint shards: 79%|███████▉ | 26/33 [00:13<00:02, 2.37it/s] Loading checkpoint shards: 79%|███████▉ | 26/33 [00:13<00:02, 2.36it/s] Loading checkpoint shards: 82%|████████▏ | 27/33 [00:13<00:02, 2.38it/s] Loading checkpoint shards: 82%|████████▏ | 27/33 [00:13<00:02, 2.39it/s] Loading checkpoint shards: 85%|████████▍ | 28/33 [00:14<00:02, 2.36it/s] Loading checkpoint shards: 85%|████████▍ | 28/33 [00:14<00:02, 2.35it/s] Loading checkpoint shards: 88%|████████▊ | 29/33 [00:14<00:01, 2.30it/s] Loading checkpoint shards: 88%|████████▊ | 29/33 [00:14<00:01, 2.30it/s] Loading checkpoint shards: 91%|█████████ | 30/33 [00:14<00:01, 2.32it/s] Loading checkpoint shards: 91%|█████████ | 30/33 [00:15<00:01, 2.31it/s] Loading checkpoint shards: 94%|█████████▍| 31/33 [00:15<00:00, 2.33it/s] Loading checkpoint shards: 94%|█████████▍| 31/33 [00:15<00:00, 2.32it/s] Loading checkpoint shards: 97%|█████████▋| 32/33 [00:15<00:00, 2.29it/s] Loading checkpoint shards: 97%|█████████▋| 32/33 [00:15<00:00, 2.29it/s] Loading checkpoint shards: 100%|██████████| 33/33 [00:16<00:00, 2.33it/s] Loading checkpoint shards: 100%|██████████| 33/33 [00:16<00:00, 2.03it/s]
Loading checkpoint shards: 100%|██████████| 33/33 [00:16<00:00, 2.32it/s] Loading checkpoint shards: 100%|██████████| 33/33 [00:16<00:00, 2.02it/s] The tokenizer class you load from this checkpoint is not the same type as the class this function is called from. It may result in unexpected tokenization. The tokenizer class you load from this checkpoint is 'LLaMATokenizer'. The class this function is called from is 'LlamaTokenizer'. The tokenizer class you load from this checkpoint is not the same type as the class this function is called from. It may result in unexpected tokenization. The tokenizer class you load from this checkpoint is 'LLaMATokenizer'. The class this function is called from is 'LlamaTokenizer'. Found cached dataset json (................../.cache/huggingface/datasets/json/default-c92b5414d77c27a9/0.0.0/fe5dd6ea2639a6df622901539cb550cf8797e5a6b2dd7af1cf934bed8e233e6e)
0%| | 0/1 [00:00<?, ?it/s] 100%|██████████| 1/1 [00:00<00:00, 80.94it/s] trainable params: 4194304 || all params: 6742609920 || trainable%: 0.06220594176090199 Found cached dataset json (................../.cache/huggingface/datasets/json/default-c92b5414d77c27a9/0.0.0/fe5dd6ea2639a6df622901539cb550cf8797e5a6b2dd7af1cf934bed8e233e6e)
0%| | 0/1 [00:00<?, ?it/s] 100%|██████████| 1/1 [00:00<00:00, 702.09it/s] trainable params: 4194304 || all params: 6742609920 || trainable%: 0.06220594176090199
Map: 0%| | 0/9 [00:00<?, ? examples/s] Map: 0%| | 0/9 [00:00<?, ? examples/s]
Map: 0%| | 0/1 [00:00<?, ? examples/s] Map: 0%| | 0/1 [00:00<?, ? examples/s]
If there's a warning about missing keys above, please disregard :)
If there's a warning about missing keys above, please disregard :) ................../.conda/envs/py38/lib/python3.8/site-packages/transformers/optimization.py:391: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set
no_deprecation_warning=True
to disable this warning warnings.warn( ................../.conda/envs/py38/lib/python3.8/site-packages/transformers/optimization.py:391: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or setno_deprecation_warning=True
to disable this warning warnings.warn(0%| | 0/3 [00:00<?, ?it/s] 33%|███▎ | 1/3 [00:02<00:05, 2.81s/it] 67%|██████▋ | 2/3 [00:04<00:02, 2.14s/it] 100%|██████████| 3/3 [00:06<00:00, 1.91s/it]
{'train_runtime': 7.8285, 'train_samples_per_second': 49.052, 'train_steps_per_second': 0.383, 'train_loss': 0.2747050126393636, 'epoch': 3.0}
100%|██████████| 3/3 [00:06<00:00, 1.91s/it] 100%|██████████| 3/3 [00:06<00:00, 2.04s/it] `
大佬,求助
update: 单卡训练无任何报错,正常运行。多卡训练报[libstdc++.so](libstdc++.so.6: version `GLIBCXX_3.4.29' not found),加上module load gcc/9.3.0后就不报错了,但是程序很快就自己中断了。 集群上六张卡试着不同组合,都是同样的情况: 无任何报错,直接中断