0% 0/1120 [00:00<?, ?it/s]huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
Avoid using tokenizers before the fork if possible
Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
Avoid using tokenizers before the fork if possible
Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
Avoid using tokenizers before the fork if possible
Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
Avoid using tokenizers before the fork if possible
Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
Avoid using tokenizers before the fork if possible
Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
Avoid using tokenizers before the fork if possible
Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
Avoid using tokenizers before the fork if possible
Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
Avoid using tokenizers before the fork if possible
Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
Avoid using tokenizers before the fork if possible
Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
Avoid using tokenizers before the fork if possible
Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
Avoid using tokenizers before the fork if possible
Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
Avoid using tokenizers before the fork if possible
Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
Avoid using tokenizers before the fork if possible
Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
Avoid using tokenizers before the fork if possible
Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
Avoid using tokenizers before the fork if possible
Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
Avoid using tokenizers before the fork if possible
Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
Avoid using tokenizers before the fork if possible
Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
Avoid using tokenizers before the fork if possible
Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
Avoid using tokenizers before the fork if possible
Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
Avoid using tokenizers before the fork if possible
Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
Avoid using tokenizers before the fork if possible
Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
Avoid using tokenizers before the fork if possible
Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
Avoid using tokenizers before the fork if possible
Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
Avoid using tokenizers before the fork if possible
Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
Avoid using tokenizers before the fork if possible
Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
Avoid using tokenizers before the fork if possible
Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
Avoid using tokenizers before the fork if possible
Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
Avoid using tokenizers before the fork if possible
Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
Avoid using tokenizers before the fork if possible
Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
Avoid using tokenizers before the fork if possible
Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
Avoid using tokenizers before the fork if possible
Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
Avoid using tokenizers before the fork if possible
Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
Avoid using tokenizers before the fork if possible
Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
0% 0/1120 [00:02<?, ?it/s]
Traceback (most recent call last):
File "/mnt/afs/share/csc/CogVLM2-main/finetune_demo/train.py", line 483, in
main()
File "/mnt/afs/share/csc/CogVLM2-main/finetune_demo/train.py", line 401, in main
outputs = model(
File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, kwargs)
File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, *kwargs)
File "/opt/conda/lib/python3.8/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn
ret_val = func(args, kwargs)
File "/opt/conda/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 1855, in forward
loss = self.module(*inputs, kwargs)
File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, *kwargs)
File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(args, kwargs)
File "/opt/conda/lib/python3.8/site-packages/peft/peft_model.py", line 1129, in forward
return self.base_model(
File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, kwargs)
File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, *kwargs)
File "/opt/conda/lib/python3.8/site-packages/peft/tuners/tuners_utils.py", line 161, in forward
return self.model.forward(args, kwargs)
File "/root/.cache/huggingface/modules/transformers_modules/modeling_cogvlm.py", line 620, in forward
outputs = self.model(
File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, kwargs)
File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, *kwargs)
File "/root/.cache/huggingface/modules/transformers_modules/modeling_cogvlm.py", line 402, in forward
return self.llm_forward(
File "/root/.cache/huggingface/modules/transformers_modules/modeling_cogvlm.py", line 486, in llm_forward
layer_outputs = decoder_layer(
File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(args, kwargs)
File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, kwargs)
File "/root/.cache/huggingface/modules/transformers_modules/modeling_cogvlm.py", line 261, in forward
hidden_states, self_attn_weights, present_key_value = self.self_attn(
File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, *kwargs)
File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(args, kwargs)
File "/root/.cache/huggingface/modules/transformers_modules/modeling_cogvlm.py", line 217, in forward
context_layer = attention_fn(
File "/root/.cache/huggingface/modules/transformers_modules/modeling_cogvlm.py", line 140, in attention_fn
attention_scores = nn.functional.softmax(attention_scores, dim=-1, dtype=torch.float32).to(query_layer.dtype)
File "/opt/conda/lib/python3.8/site-packages/torch/nn/functional.py", line 1860, in softmax
ret = input.softmax(dim, dtype=dtype)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 1.09 GiB. GPU 1 has a total capacity of 79.33 GiB of which 457.81 MiB is free. Process 701748 has 78.87 GiB memory in use. Of the allocated memory 75.91 GiB is allocated by PyTorch, and 1.73 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
Avoid using tokenizers before the fork if possible
Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
0% 0/1120 [00:03<?, ?it/s]
Traceback (most recent call last):
File "/mnt/afs/share/csc/CogVLM2-main/finetune_demo/train.py", line 483, in
main()
File "/mnt/afs/share/csc/CogVLM2-main/finetune_demo/train.py", line 401, in main
outputs = model(
File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, kwargs)
File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, *kwargs)
File "/opt/conda/lib/python3.8/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn
ret_val = func(args, kwargs)
File "/opt/conda/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 1855, in forward
loss = self.module(*inputs, *kwargs)
File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(args, **kwargs)
File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
0% 0/1120 [00:02<?, ?it/s]
return forward_call(*args, kwargs)
File "/opt/conda/lib/python3.8/site-packages/peft/peft_model.py", line 1129, in forward
Traceback (most recent call last):
File "/mnt/afs/share/csc/CogVLM2-main/finetune_demo/train.py", line 483, in
return self.base_model(
File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, *kwargs)
File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(args, kwargs)
File "/opt/conda/lib/python3.8/site-packages/peft/tuners/tuners_utils.py", line 161, in forward
return self.model.forward(*args, kwargs)
File "/root/.cache/huggingface/modules/transformers_modules/modeling_cogvlm.py", line 620, in forward
outputs = self.model(
File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, *kwargs)
File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
main()
File "/mnt/afs/share/csc/CogVLM2-main/finetune_demo/train.py", line 401, in main
return forward_call(args, kwargs)
File "/root/.cache/huggingface/modules/transformers_modules/modeling_cogvlm.py", line 402, in forward
return self.llm_forward(
File "/root/.cache/huggingface/modules/transformers_modules/modeling_cogvlm.py", line 486, in llm_forward
layer_outputs = decoder_layer(
File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, kwargs)
File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, *kwargs)
File "/root/.cache/huggingface/modules/transformers_modules/modeling_cogvlm.py", line 261, in forward
hidden_states, self_attn_weights, present_key_value = self.self_attn(
File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
outputs = model(
File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(args, kwargs)
File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, kwargs)
File "/root/.cache/huggingface/modules/transformers_modules/modeling_cogvlm.py", line 217, in forward
return self._call_impl(*args, *kwargs)
File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
context_layer = attention_fn(
File "/root/.cache/huggingface/modules/transformers_modules/modeling_cogvlm.py", line 140, in attention_fn
attention_scores = nn.functional.softmax(attention_scores, dim=-1, dtype=torch.float32).to(query_layer.dtype)
File "/opt/conda/lib/python3.8/site-packages/torch/nn/functional.py", line 1860, in softmax
return forward_call(args, kwargs)
File "/opt/conda/lib/python3.8/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn
ret = input.softmax(dim, dtype=dtype)ret_val = func(*args, **kwargs)
File "/opt/conda/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 1855, in forward
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 1.09 GiB. GPU 4 has a total capacity of 79.33 GiB of which 457.81 MiB is free. Process 701751 has 78.87 GiB memory in use. Of the allocated memory 75.91 GiB is allocated by PyTorch, and 1.73 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)
loss = self.module(*inputs, kwargs)
File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, *kwargs)
File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(args, kwargs)
File "/opt/conda/lib/python3.8/site-packages/peft/peft_model.py", line 1129, in forward
return self.base_model(
File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, kwargs)
File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, *kwargs)
File "/opt/conda/lib/python3.8/site-packages/peft/tuners/tuners_utils.py", line 161, in forward
return self.model.forward(args, kwargs)
File "/root/.cache/huggingface/modules/transformers_modules/modeling_cogvlm.py", line 620, in forward
outputs = self.model(
File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, kwargs)
File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, *kwargs)
File "/root/.cache/huggingface/modules/transformers_modules/modeling_cogvlm.py", line 402, in forward
return self.llm_forward(
File "/root/.cache/huggingface/modules/transformers_modules/modeling_cogvlm.py", line 486, in llm_forward
layer_outputs = decoder_layer(
File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(args, kwargs)
File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, kwargs)
File "/root/.cache/huggingface/modules/transformers_modules/modeling_cogvlm.py", line 261, in forward
hidden_states, self_attn_weights, present_key_value = self.self_attn(
File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, *kwargs)
File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(args, kwargs)
File "/root/.cache/huggingface/modules/transformers_modules/modeling_cogvlm.py", line 217, in forward
context_layer = attention_fn(
File "/root/.cache/huggingface/modules/transformers_modules/modeling_cogvlm.py", line 140, in attention_fn
attention_scores = nn.functional.softmax(attention_scores, dim=-1, dtype=torch.float32).to(query_layer.dtype)
File "/opt/conda/lib/python3.8/site-packages/torch/nn/functional.py", line 1860, in softmax
ret = input.softmax(dim, dtype=dtype)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 1.09 GiB. GPU 0 has a total capacity of 79.33 GiB of which 553.81 MiB is free. Process 701747 has 78.78 GiB memory in use. Of the allocated memory 75.91 GiB is allocated by PyTorch, and 1.73 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
Avoid using tokenizers before the fork if possible
Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
Avoid using tokenizers before the fork if possible
Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
System Info / 系統信息
80G H800 cuda11.8 python3.8.13
Who can help? / 谁可以帮助到您?
@zRzRzRzRzRzRzR @1049451037
Information / 问题信息
Reproduction / 复现过程
torch启动 MODEL_Path="./cogvlm2-llama3-chat-19B/" train_data="./cogvlm_train.json" epochs=3 lr=8e-6 batch_size=1 output_dir="./output/" deepspeed_config_file="./finetune_demo/ds_config.yaml"
CUDA_VISIBLE_DEVICES="0,1,2,3,4,5,6,7" torchrun --nnodes ${tmp_nodes} --nproc_per_node 8 \ --master_addr ${tmp_master_addr} --node_rank ${tmp_node_rank} \ --master_port ${tmp_master_port} .//finetune_demo/train.py \ --lr ${lr} \ --num_epochs ${epochs} \ --batch_size ${batch_size} \ --max_input_len 512 \ --max_output_len 200 \ --save_step 200 \ --model_path ${MODEL_Path} \ --dataset_path ${train_data} \ --save_path ${output_dir} \ --ds_config ${deepspeed_config_file} \
[2024-05-30 08:23:52,121] [INFO] [config.py:1000:print] bfloat16_immediate_grad_update False [2024-05-30 08:23:52,121] [INFO] [config.py:1000:print] checkpoint_parallel_write_pipeline False [2024-05-30 08:23:52,121] [INFO] [config.py:1000:print] checkpoint_tag_validation_enabled True [2024-05-30 08:23:52,121] [INFO] [config.py:1000:print] checkpoint_tag_validation_fail False [2024-05-30 08:23:52,121] [INFO] [config.py:1000:print] comms_config ................. <deepspeed.comm.config.DeepSpeedCommsConfig object at 0x7fe7bb3699a0> [2024-05-30 08:23:52,121] [INFO] [config.py:1000:print] communication_data_type ...... None [2024-05-30 08:23:52,121] [INFO] [config.py:1000:print] compile_config ............... enabled=False backend='inductor' kwargs={} [2024-05-30 08:23:52,121] [INFO] [config.py:1000:print] compression_config ........... {'weight_quantization': {'shared_parameters': {'enabled': False, 'quantizer_kernel': False, 'schedule_offset': 0, 'quantize_groups': 1, 'quantize_verbose': False, 'quantization_type': 'symmetric', 'quantize_weight_in_forward': False, 'rounding': 'nearest', 'fp16_mixed_quantize': False, 'quantize_change_ratio': 0.001}, 'different_groups': {}}, 'activation_quantization': {'shared_parameters': {'enabled': False, 'quantization_type': 'symmetric', 'range_calibration': 'dynamic', 'schedule_offset': 1000}, 'different_groups': {}}, 'sparse_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'row_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'head_pruning': {'shared_parameters': {'enabled': False, 'method': 'topk', 'schedule_offset': 1000}, 'different_groups': {}}, 'channel_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'layer_reduction': {'enabled': False}} [2024-05-30 08:23:52,121] [INFO] [config.py:1000:print] curriculum_enabled_legacy .... False [2024-05-30 08:23:52,121] [INFO] [config.py:1000:print] curriculum_params_legacy ..... False [2024-05-30 08:23:52,121] [INFO] [config.py:1000:print] data_efficiency_config ....... {'enabled': False, 'seed': 1234, 'data_sampling': {'enabled': False, 'num_epochs': 1000, 'num_workers': 0, 'curriculum_learning': {'enabled': False}}, 'data_routing': {'enabled': False, 'random_ltd': {'enabled': False, 'layer_token_lr_schedule': {'enabled': False}}}} [2024-05-30 08:23:52,121] [INFO] [config.py:1000:print] data_efficiency_enabled ...... False [2024-05-30 08:23:52,121] [INFO] [config.py:1000:print] dataloader_drop_last ......... False [2024-05-30 08:23:52,121] [INFO] [config.py:1000:print] disable_allgather ............ False [2024-05-30 08:23:52,121] [INFO] [config.py:1000:print] dump_state ................... False [2024-05-30 08:23:52,121] [INFO] [config.py:1000:print] dynamic_loss_scale_args ...... None [2024-05-30 08:23:52,121] [INFO] [config.py:1000:print] eigenvalue_enabled ........... False [2024-05-30 08:23:52,121] [INFO] [config.py:1000:print] eigenvalue_gas_boundary_resolution 1 [2024-05-30 08:23:52,121] [INFO] [config.py:1000:print] eigenvalue_layer_name ........ bert.encoder.layer [2024-05-30 08:23:52,121] [INFO] [config.py:1000:print] eigenvalue_layer_num ......... 0 [2024-05-30 08:23:52,121] [INFO] [config.py:1000:print] eigenvalue_max_iter .......... 100 [2024-05-30 08:23:52,121] [INFO] [config.py:1000:print] eigenvalue_stability ......... 1e-06 [2024-05-30 08:23:52,121] [INFO] [config.py:1000:print] eigenvalue_tol ............... 0.01 [2024-05-30 08:23:52,121] [INFO] [config.py:1000:print] eigenvalue_verbose ........... False [2024-05-30 08:23:52,121] [INFO] [config.py:1000:print] elasticity_enabled ........... False [2024-05-30 08:23:52,122] [INFO] [config.py:1000:print] flops_profiler_config ........ { "enabled": false, "recompute_fwd_factor": 0.0, "profile_step": 1, "module_depth": -1, "top_modules": 1, "detailed": true, "output_file": null } [2024-05-30 08:23:52,122] [INFO] [config.py:1000:print] fp16_auto_cast ............... None [2024-05-30 08:23:52,122] [INFO] [config.py:1000:print] fp16_enabled ................. False [2024-05-30 08:23:52,122] [INFO] [config.py:1000:print] fp16_master_weights_and_gradients False [2024-05-30 08:23:52,122] [INFO] [config.py:1000:print] global_rank .................. 0 [2024-05-30 08:23:52,122] [INFO] [config.py:1000:print] grad_accum_dtype ............. None [2024-05-30 08:23:52,122] [INFO] [config.py:1000:print] gradient_accumulation_steps .. 1 [2024-05-30 08:23:52,122] [INFO] [config.py:1000:print] gradient_clipping ............ 0.1 [2024-05-30 08:23:52,122] [INFO] [config.py:1000:print] gradient_predivide_factor .... 1.0 [2024-05-30 08:23:52,122] [INFO] [config.py:1000:print] graph_harvesting ............. False [2024-05-30 08:23:52,122] [INFO] [config.py:1000:print] hybrid_engine ................ enabled=False max_out_tokens=512 inference_tp_size=1 release_inference_cache=False pin_parameters=True tp_gather_partition_size=8 [2024-05-30 08:23:52,122] [INFO] [config.py:1000:print] initial_dynamic_scale ........ 1 [2024-05-30 08:23:52,122] [INFO] [config.py:1000:print] load_universal_checkpoint .... False [2024-05-30 08:23:52,122] [INFO] [config.py:1000:print] loss_scale ................... 1.0 [2024-05-30 08:23:52,122] [INFO] [config.py:1000:print] memory_breakdown ............. False [2024-05-30 08:23:52,122] [INFO] [config.py:1000:print] mics_hierarchial_params_gather False [2024-05-30 08:23:52,122] [INFO] [config.py:1000:print] mics_shard_size .............. -1 [2024-05-30 08:23:52,122] [INFO] [config.py:1000:print] monitor_config ............... tensorboard=TensorBoardConfig(enabled=False, output_path='', job_name='DeepSpeedJobName') wandb=WandbConfig(enabled=False, group=None, team=None, project='deepspeed') csv_monitor=CSVConfig(enabled=False, output_path='', job_name='DeepSpeedJobName') enabled=False [2024-05-30 08:23:52,122] [INFO] [config.py:1000:print] nebula_config ................ { "enabled": false, "persistent_storage_path": null, "persistent_time_interval": 100, "num_of_version_in_retention": 2, "enable_nebula_load": true, "load_path": null } [2024-05-30 08:23:52,122] [INFO] [config.py:1000:print] optimizer_legacy_fusion ...... False [2024-05-30 08:23:52,122] [INFO] [config.py:1000:print] optimizer_name ............... None [2024-05-30 08:23:52,122] [INFO] [config.py:1000:print] optimizer_params ............. None [2024-05-30 08:23:52,122] [INFO] [config.py:1000:print] pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0, 'pipe_partitioned': True, 'grad_partitioned': True} [2024-05-30 08:23:52,122] [INFO] [config.py:1000:print] pld_enabled .................. False [2024-05-30 08:23:52,122] [INFO] [config.py:1000:print] pld_params ................... False [2024-05-30 08:23:52,122] [INFO] [config.py:1000:print] prescale_gradients ........... False [2024-05-30 08:23:52,122] [INFO] [config.py:1000:print] scheduler_name ............... None [2024-05-30 08:23:52,122] [INFO] [config.py:1000:print] scheduler_params ............. None [2024-05-30 08:23:52,122] [INFO] [config.py:1000:print] seq_parallel_communication_data_type torch.float32 [2024-05-30 08:23:52,122] [INFO] [config.py:1000:print] sparse_attention ............. None [2024-05-30 08:23:52,122] [INFO] [config.py:1000:print] sparse_gradients_enabled ..... False [2024-05-30 08:23:52,122] [INFO] [config.py:1000:print] steps_per_print .............. inf [2024-05-30 08:23:52,122] [INFO] [config.py:1000:print] train_batch_size ............. 8 [2024-05-30 08:23:52,122] [INFO] [config.py:1000:print] train_micro_batch_size_per_gpu 1 [2024-05-30 08:23:52,122] [INFO] [config.py:1000:print] use_data_before_expertparallel False [2024-05-30 08:23:52,122] [INFO] [config.py:1000:print] use_node_local_storage ....... False [2024-05-30 08:23:52,122] [INFO] [config.py:1000:print] wall_clock_breakdown ......... False [2024-05-30 08:23:52,122] [INFO] [config.py:1000:print] weight_quantization_config ... None [2024-05-30 08:23:52,122] [INFO] [config.py:1000:print] world_size ................... 8 [2024-05-30 08:23:52,123] [INFO] [config.py:1000:print] zero_allow_untested_optimizer True [2024-05-30 08:23:52,123] [INFO] [config.py:1000:print] zero_config .................. stage=2 contiguous_gradients=False reduce_scatter=True reduce_bucket_size=40000000 use_multi_rank_bucket_allreduce=True allgather_partitions=True allgather_bucket_size=100000000 overlap_comm=True load_from_fp32_weights=False elastic_checkpoint=False offload_param=None offload_optimizer=None sub_group_size=1,000,000,000 cpu_offload_param=None cpu_offload_use_pin_memory=None cpu_offload=None prefetch_bucket_size=50,000,000 param_persistence_threshold=100,000 model_persistence_threshold=sys.maxsize max_live_parameters=1,000,000,000 max_reuse_distance=1,000,000,000 gather_16bit_weights_on_model_save=False stage3_gather_fp16_weights_on_model_save=False ignore_unused_parameters=True legacy_stage1=False round_robin_gradients=False zero_hpz_partition_size=1 zero_quantized_weights=False zero_quantized_nontrainable_weights=False zero_quantized_gradients=False mics_shard_size=-1 mics_hierarchical_params_gather=False memory_efficient_linear=True pipeline_loading_checkpoint=False override_module_apply=True [2024-05-30 08:23:52,123] [INFO] [config.py:1000:print] zero_enabled ................. True [2024-05-30 08:23:52,123] [INFO] [config.py:1000:print] zero_force_ds_cpu_optimizer .. True [2024-05-30 08:23:52,123] [INFO] [config.py:1000:print] zero_optimization_stage ...... 2 [2024-05-30 08:23:52,123] [INFO] [config.py:986:print_user_config] json = { "train_micro_batch_size_per_gpu": 1, "gradient_accumulation_steps": 1, "steps_per_print": inf, "gradient_clipping": 0.1, "zero_optimization": { "stage": 2, "contiguous_gradients": false, "overlap_comm": true, "reduce_scatter": true, "reduce_bucket_size": 4.000000e+07, "allgather_bucket_size": 1.000000e+08, "load_from_fp32_weights": false, "round_robin_gradients": false }, "offload_optimizer": { "device": "cpu", "pin_memory": true }, "zero_allow_untested_optimizer": true, "bf16": { "enabled": true }, "activation_checkpointing": { "partition_activations": false, "contiguous_memory_optimization": false, "cpu_checkpointing": false }, "wall_clock_breakdown": false, "fp16": { "enabled": false } } INFO:main:Preparation done. Starting training...
0% 0/1120 [00:00<?, ?it/s]huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either:
tokenizers
before the fork if possibletokenizers
before the fork if possibletokenizers
before the fork if possibletokenizers
before the fork if possibletokenizers
before the fork if possibletokenizers
before the fork if possibletokenizers
before the fork if possibletokenizers
before the fork if possibletokenizers
before the fork if possibletokenizers
before the fork if possibletokenizers
before the fork if possibletokenizers
before the fork if possibletokenizers
before the fork if possibletokenizers
before the fork if possibletokenizers
before the fork if possibletokenizers
before the fork if possibletokenizers
before the fork if possibletokenizers
before the fork if possibletokenizers
before the fork if possibletokenizers
before the fork if possibletokenizers
before the fork if possibletokenizers
before the fork if possibletokenizers
before the fork if possibletokenizers
before the fork if possibletokenizers
before the fork if possibletokenizers
before the fork if possibletokenizers
before the fork if possibletokenizers
before the fork if possibletokenizers
before the fork if possibletokenizers
before the fork if possibletokenizers
before the fork if possibletokenizers
before the fork if possibletokenizers
before the fork if possibleExplicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
0% 0/1120 [00:02<?, ?it/s] Traceback (most recent call last): File "/mnt/afs/share/csc/CogVLM2-main/finetune_demo/train.py", line 483, in
main()
File "/mnt/afs/share/csc/CogVLM2-main/finetune_demo/train.py", line 401, in main
outputs = model(
File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, kwargs)
File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, *kwargs)
File "/opt/conda/lib/python3.8/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn
ret_val = func(args, kwargs)
File "/opt/conda/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 1855, in forward
loss = self.module(*inputs, kwargs)
File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, *kwargs)
File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(args, kwargs)
File "/opt/conda/lib/python3.8/site-packages/peft/peft_model.py", line 1129, in forward
return self.base_model(
File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, kwargs)
File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, *kwargs)
File "/opt/conda/lib/python3.8/site-packages/peft/tuners/tuners_utils.py", line 161, in forward
return self.model.forward(args, kwargs)
File "/root/.cache/huggingface/modules/transformers_modules/modeling_cogvlm.py", line 620, in forward
outputs = self.model(
File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, kwargs)
File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, *kwargs)
File "/root/.cache/huggingface/modules/transformers_modules/modeling_cogvlm.py", line 402, in forward
return self.llm_forward(
File "/root/.cache/huggingface/modules/transformers_modules/modeling_cogvlm.py", line 486, in llm_forward
layer_outputs = decoder_layer(
File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(args, kwargs)
File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, kwargs)
File "/root/.cache/huggingface/modules/transformers_modules/modeling_cogvlm.py", line 261, in forward
hidden_states, self_attn_weights, present_key_value = self.self_attn(
File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, *kwargs)
File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(args, kwargs)
File "/root/.cache/huggingface/modules/transformers_modules/modeling_cogvlm.py", line 217, in forward
context_layer = attention_fn(
File "/root/.cache/huggingface/modules/transformers_modules/modeling_cogvlm.py", line 140, in attention_fn
attention_scores = nn.functional.softmax(attention_scores, dim=-1, dtype=torch.float32).to(query_layer.dtype)
File "/opt/conda/lib/python3.8/site-packages/torch/nn/functional.py", line 1860, in softmax
ret = input.softmax(dim, dtype=dtype)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 1.09 GiB. GPU 1 has a total capacity of 79.33 GiB of which 457.81 MiB is free. Process 701748 has 78.87 GiB memory in use. Of the allocated memory 75.91 GiB is allocated by PyTorch, and 1.73 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
tokenizers
before the fork if possibleExplicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
0% 0/1120 [00:03<?, ?it/s] Traceback (most recent call last): File "/mnt/afs/share/csc/CogVLM2-main/finetune_demo/train.py", line 483, in
main()
File "/mnt/afs/share/csc/CogVLM2-main/finetune_demo/train.py", line 401, in main
outputs = model(
File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, kwargs)
File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, *kwargs)
File "/opt/conda/lib/python3.8/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn
ret_val = func(args, kwargs)
File "/opt/conda/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 1855, in forward
loss = self.module(*inputs, *kwargs)
File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(args, **kwargs)
File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
0% 0/1120 [00:02<?, ?it/s] return forward_call(*args, kwargs) File "/opt/conda/lib/python3.8/site-packages/peft/peft_model.py", line 1129, in forward Traceback (most recent call last): File "/mnt/afs/share/csc/CogVLM2-main/finetune_demo/train.py", line 483, in
return self.base_model(
File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, *kwargs)
File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(args, kwargs)
File "/opt/conda/lib/python3.8/site-packages/peft/tuners/tuners_utils.py", line 161, in forward
return self.model.forward(*args, kwargs)
File "/root/.cache/huggingface/modules/transformers_modules/modeling_cogvlm.py", line 620, in forward
outputs = self.model(
File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, *kwargs)
File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
main()
File "/mnt/afs/share/csc/CogVLM2-main/finetune_demo/train.py", line 401, in main
return forward_call(args, kwargs)
File "/root/.cache/huggingface/modules/transformers_modules/modeling_cogvlm.py", line 402, in forward
return self.llm_forward(
File "/root/.cache/huggingface/modules/transformers_modules/modeling_cogvlm.py", line 486, in llm_forward
layer_outputs = decoder_layer(
File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, kwargs)
File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, *kwargs)
File "/root/.cache/huggingface/modules/transformers_modules/modeling_cogvlm.py", line 261, in forward
hidden_states, self_attn_weights, present_key_value = self.self_attn(
File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
outputs = model(
File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(args, kwargs)
File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, kwargs)
File "/root/.cache/huggingface/modules/transformers_modules/modeling_cogvlm.py", line 217, in forward
return self._call_impl(*args, *kwargs)
File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
context_layer = attention_fn(
File "/root/.cache/huggingface/modules/transformers_modules/modeling_cogvlm.py", line 140, in attention_fn
attention_scores = nn.functional.softmax(attention_scores, dim=-1, dtype=torch.float32).to(query_layer.dtype)
File "/opt/conda/lib/python3.8/site-packages/torch/nn/functional.py", line 1860, in softmax
return forward_call(args, kwargs)
File "/opt/conda/lib/python3.8/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn
ret = input.softmax(dim, dtype=dtype)ret_val = func(*args, **kwargs)
File "/opt/conda/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 1855, in forward torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 1.09 GiB. GPU 4 has a total capacity of 79.33 GiB of which 457.81 MiB is free. Process 701751 has 78.87 GiB memory in use. Of the allocated memory 75.91 GiB is allocated by PyTorch, and 1.73 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables) loss = self.module(*inputs, kwargs) File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, *kwargs) File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(args, kwargs) File "/opt/conda/lib/python3.8/site-packages/peft/peft_model.py", line 1129, in forward return self.base_model( File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, kwargs) File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(*args, *kwargs) File "/opt/conda/lib/python3.8/site-packages/peft/tuners/tuners_utils.py", line 161, in forward return self.model.forward(args, kwargs) File "/root/.cache/huggingface/modules/transformers_modules/modeling_cogvlm.py", line 620, in forward outputs = self.model( File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, kwargs) File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(*args, *kwargs) File "/root/.cache/huggingface/modules/transformers_modules/modeling_cogvlm.py", line 402, in forward return self.llm_forward( File "/root/.cache/huggingface/modules/transformers_modules/modeling_cogvlm.py", line 486, in llm_forward layer_outputs = decoder_layer( File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(args, kwargs) File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(*args, kwargs) File "/root/.cache/huggingface/modules/transformers_modules/modeling_cogvlm.py", line 261, in forward hidden_states, self_attn_weights, present_key_value = self.self_attn( File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, *kwargs) File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(args, kwargs) File "/root/.cache/huggingface/modules/transformers_modules/modeling_cogvlm.py", line 217, in forward context_layer = attention_fn( File "/root/.cache/huggingface/modules/transformers_modules/modeling_cogvlm.py", line 140, in attention_fn attention_scores = nn.functional.softmax(attention_scores, dim=-1, dtype=torch.float32).to(query_layer.dtype) File "/opt/conda/lib/python3.8/site-packages/torch/nn/functional.py", line 1860, in softmax ret = input.softmax(dim, dtype=dtype) torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 1.09 GiB. GPU 0 has a total capacity of 79.33 GiB of which 553.81 MiB is free. Process 701747 has 78.78 GiB memory in use. Of the allocated memory 75.91 GiB is allocated by PyTorch, and 1.73 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables) huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either:
tokenizers
before the fork if possibletokenizers
before the fork if possibleExpected behavior / 期待表现
默认lora参数微调,按照作者说法75g显存 8卡可以微调,但是我用了一台机器,以及多台(4-6)机器都爆显存溢出,请问是哪里出的问题