单机多卡、多机多卡一直爆显存

System Info / 系統信息

80G H800 cuda11.8 python3.8.13

Who can help? / 谁可以帮助到您？

@zRzRzRzRzRzRzR @1049451037

Information / 问题信息

[X] The official example scripts / 官方的示例脚本
[ ] My own modified scripts / 我自己修改的脚本和任务

Reproduction / 复现过程

torch启动 MODEL_Path="./cogvlm2-llama3-chat-19B/" train_data="./cogvlm_train.json" epochs=3 lr=8e-6 batch_size=1 output_dir="./output/" deepspeed_config_file="./finetune_demo/ds_config.yaml"

CUDA_VISIBLE_DEVICES="0,1,2,3,4,5,6,7" torchrun --nnodes ${tmp_nodes} --nproc_per_node 8 \ --master_addr ${tmp_master_addr} --node_rank ${tmp_node_rank} \ --master_port ${tmp_master_port} .//finetune_demo/train.py \ --lr ${lr} \ --num_epochs ${epochs} \ --batch_size ${batch_size} \ --max_input_len 512 \ --max_output_len 200 \ --save_step 200 \ --model_path ${MODEL_Path} \ --dataset_path ${train_data} \ --save_path ${output_dir} \ --ds_config ${deepspeed_config_file} \

[2024-05-30 08:23:52,121] [INFO] [config.py:1000:print]   bfloat16_enabled ............. True

[2024-05-30 08:23:52,121] [INFO] [config.py:1000:print] bfloat16_immediate_grad_update False [2024-05-30 08:23:52,121] [INFO] [config.py:1000:print] checkpoint_parallel_write_pipeline False [2024-05-30 08:23:52,121] [INFO] [config.py:1000:print] checkpoint_tag_validation_enabled True [2024-05-30 08:23:52,121] [INFO] [config.py:1000:print] checkpoint_tag_validation_fail False [2024-05-30 08:23:52,121] [INFO] [config.py:1000:print] comms_config ................. <deepspeed.comm.config.DeepSpeedCommsConfig object at 0x7fe7bb3699a0> [2024-05-30 08:23:52,121] [INFO] [config.py:1000:print] communication_data_type ...... None [2024-05-30 08:23:52,121] [INFO] [config.py:1000:print] compile_config ............... enabled=False backend='inductor' kwargs={} [2024-05-30 08:23:52,121] [INFO] [config.py:1000:print] compression_config ........... {'weight_quantization': {'shared_parameters': {'enabled': False, 'quantizer_kernel': False, 'schedule_offset': 0, 'quantize_groups': 1, 'quantize_verbose': False, 'quantization_type': 'symmetric', 'quantize_weight_in_forward': False, 'rounding': 'nearest', 'fp16_mixed_quantize': False, 'quantize_change_ratio': 0.001}, 'different_groups': {}}, 'activation_quantization': {'shared_parameters': {'enabled': False, 'quantization_type': 'symmetric', 'range_calibration': 'dynamic', 'schedule_offset': 1000}, 'different_groups': {}}, 'sparse_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'row_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'head_pruning': {'shared_parameters': {'enabled': False, 'method': 'topk', 'schedule_offset': 1000}, 'different_groups': {}}, 'channel_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'layer_reduction': {'enabled': False}} [2024-05-30 08:23:52,121] [INFO] [config.py:1000:print] curriculum_enabled_legacy .... False [2024-05-30 08:23:52,121] [INFO] [config.py:1000:print] curriculum_params_legacy ..... False [2024-05-30 08:23:52,121] [INFO] [config.py:1000:print] data_efficiency_config ....... {'enabled': False, 'seed': 1234, 'data_sampling': {'enabled': False, 'num_epochs': 1000, 'num_workers': 0, 'curriculum_learning': {'enabled': False}}, 'data_routing': {'enabled': False, 'random_ltd': {'enabled': False, 'layer_token_lr_schedule': {'enabled': False}}}} [2024-05-30 08:23:52,121] [INFO] [config.py:1000:print] data_efficiency_enabled ...... False [2024-05-30 08:23:52,121] [INFO] [config.py:1000:print] dataloader_drop_last ......... False [2024-05-30 08:23:52,121] [INFO] [config.py:1000:print] disable_allgather ............ False [2024-05-30 08:23:52,121] [INFO] [config.py:1000:print] dump_state ................... False [2024-05-30 08:23:52,121] [INFO] [config.py:1000:print] dynamic_loss_scale_args ...... None [2024-05-30 08:23:52,121] [INFO] [config.py:1000:print] eigenvalue_enabled ........... False [2024-05-30 08:23:52,121] [INFO] [config.py:1000:print] eigenvalue_gas_boundary_resolution 1 [2024-05-30 08:23:52,121] [INFO] [config.py:1000:print] eigenvalue_layer_name ........ bert.encoder.layer [2024-05-30 08:23:52,121] [INFO] [config.py:1000:print] eigenvalue_layer_num ......... 0 [2024-05-30 08:23:52,121] [INFO] [config.py:1000:print] eigenvalue_max_iter .......... 100 [2024-05-30 08:23:52,121] [INFO] [config.py:1000:print] eigenvalue_stability ......... 1e-06 [2024-05-30 08:23:52,121] [INFO] [config.py:1000:print] eigenvalue_tol ............... 0.01 [2024-05-30 08:23:52,121] [INFO] [config.py:1000:print] eigenvalue_verbose ........... False [2024-05-30 08:23:52,121] [INFO] [config.py:1000:print] elasticity_enabled ........... False [2024-05-30 08:23:52,122] [INFO] [config.py:1000:print] flops_profiler_config ........ { "enabled": false, "recompute_fwd_factor": 0.0, "profile_step": 1, "module_depth": -1, "top_modules": 1, "detailed": true, "output_file": null } [2024-05-30 08:23:52,122] [INFO] [config.py:1000:print] fp16_auto_cast ............... None [2024-05-30 08:23:52,122] [INFO] [config.py:1000:print] fp16_enabled ................. False [2024-05-30 08:23:52,122] [INFO] [config.py:1000:print] fp16_master_weights_and_gradients False [2024-05-30 08:23:52,122] [INFO] [config.py:1000:print] global_rank .................. 0 [2024-05-30 08:23:52,122] [INFO] [config.py:1000:print] grad_accum_dtype ............. None [2024-05-30 08:23:52,122] [INFO] [config.py:1000:print] gradient_accumulation_steps .. 1 [2024-05-30 08:23:52,122] [INFO] [config.py:1000:print] gradient_clipping ............ 0.1 [2024-05-30 08:23:52,122] [INFO] [config.py:1000:print] gradient_predivide_factor .... 1.0 [2024-05-30 08:23:52,122] [INFO] [config.py:1000:print] graph_harvesting ............. False [2024-05-30 08:23:52,122] [INFO] [config.py:1000:print] hybrid_engine ................ enabled=False max_out_tokens=512 inference_tp_size=1 release_inference_cache=False pin_parameters=True tp_gather_partition_size=8 [2024-05-30 08:23:52,122] [INFO] [config.py:1000:print] initial_dynamic_scale ........ 1 [2024-05-30 08:23:52,122] [INFO] [config.py:1000:print] load_universal_checkpoint .... False [2024-05-30 08:23:52,122] [INFO] [config.py:1000:print] loss_scale ................... 1.0 [2024-05-30 08:23:52,122] [INFO] [config.py:1000:print] memory_breakdown ............. False [2024-05-30 08:23:52,122] [INFO] [config.py:1000:print] mics_hierarchial_params_gather False [2024-05-30 08:23:52,122] [INFO] [config.py:1000:print] mics_shard_size .............. -1 [2024-05-30 08:23:52,122] [INFO] [config.py:1000:print] monitor_config ............... tensorboard=TensorBoardConfig(enabled=False, output_path='', job_name='DeepSpeedJobName') wandb=WandbConfig(enabled=False, group=None, team=None, project='deepspeed') csv_monitor=CSVConfig(enabled=False, output_path='', job_name='DeepSpeedJobName') enabled=False [2024-05-30 08:23:52,122] [INFO] [config.py:1000:print] nebula_config ................ { "enabled": false, "persistent_storage_path": null, "persistent_time_interval": 100, "num_of_version_in_retention": 2, "enable_nebula_load": true, "load_path": null } [2024-05-30 08:23:52,122] [INFO] [config.py:1000:print] optimizer_legacy_fusion ...... False [2024-05-30 08:23:52,122] [INFO] [config.py:1000:print] optimizer_name ............... None [2024-05-30 08:23:52,122] [INFO] [config.py:1000:print] optimizer_params ............. None [2024-05-30 08:23:52,122] [INFO] [config.py:1000:print] pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0, 'pipe_partitioned': True, 'grad_partitioned': True} [2024-05-30 08:23:52,122] [INFO] [config.py:1000:print] pld_enabled .................. False [2024-05-30 08:23:52,122] [INFO] [config.py:1000:print] pld_params ................... False [2024-05-30 08:23:52,122] [INFO] [config.py:1000:print] prescale_gradients ........... False [2024-05-30 08:23:52,122] [INFO] [config.py:1000:print] scheduler_name ............... None [2024-05-30 08:23:52,122] [INFO] [config.py:1000:print] scheduler_params ............. None [2024-05-30 08:23:52,122] [INFO] [config.py:1000:print] seq_parallel_communication_data_type torch.float32 [2024-05-30 08:23:52,122] [INFO] [config.py:1000:print] sparse_attention ............. None [2024-05-30 08:23:52,122] [INFO] [config.py:1000:print] sparse_gradients_enabled ..... False [2024-05-30 08:23:52,122] [INFO] [config.py:1000:print] steps_per_print .............. inf [2024-05-30 08:23:52,122] [INFO] [config.py:1000:print] train_batch_size ............. 8 [2024-05-30 08:23:52,122] [INFO] [config.py:1000:print] train_micro_batch_size_per_gpu 1 [2024-05-30 08:23:52,122] [INFO] [config.py:1000:print] use_data_before_expertparallel False [2024-05-30 08:23:52,122] [INFO] [config.py:1000:print] use_node_local_storage ....... False [2024-05-30 08:23:52,122] [INFO] [config.py:1000:print] wall_clock_breakdown ......... False [2024-05-30 08:23:52,122] [INFO] [config.py:1000:print] weight_quantization_config ... None [2024-05-30 08:23:52,122] [INFO] [config.py:1000:print] world_size ................... 8 [2024-05-30 08:23:52,123] [INFO] [config.py:1000:print] zero_allow_untested_optimizer True [2024-05-30 08:23:52,123] [INFO] [config.py:1000:print] zero_config .................. stage=2 contiguous_gradients=False reduce_scatter=True reduce_bucket_size=40000000 use_multi_rank_bucket_allreduce=True allgather_partitions=True allgather_bucket_size=100000000 overlap_comm=True load_from_fp32_weights=False elastic_checkpoint=False offload_param=None offload_optimizer=None sub_group_size=1,000,000,000 cpu_offload_param=None cpu_offload_use_pin_memory=None cpu_offload=None prefetch_bucket_size=50,000,000 param_persistence_threshold=100,000 model_persistence_threshold=sys.maxsize max_live_parameters=1,000,000,000 max_reuse_distance=1,000,000,000 gather_16bit_weights_on_model_save=False stage3_gather_fp16_weights_on_model_save=False ignore_unused_parameters=True legacy_stage1=False round_robin_gradients=False zero_hpz_partition_size=1 zero_quantized_weights=False zero_quantized_nontrainable_weights=False zero_quantized_gradients=False mics_shard_size=-1 mics_hierarchical_params_gather=False memory_efficient_linear=True pipeline_loading_checkpoint=False override_module_apply=True [2024-05-30 08:23:52,123] [INFO] [config.py:1000:print] zero_enabled ................. True [2024-05-30 08:23:52,123] [INFO] [config.py:1000:print] zero_force_ds_cpu_optimizer .. True [2024-05-30 08:23:52,123] [INFO] [config.py:1000:print] zero_optimization_stage ...... 2 [2024-05-30 08:23:52,123] [INFO] [config.py:986:print_user_config] json = { "train_micro_batch_size_per_gpu": 1, "gradient_accumulation_steps": 1, "steps_per_print": inf, "gradient_clipping": 0.1, "zero_optimization": { "stage": 2, "contiguous_gradients": false, "overlap_comm": true, "reduce_scatter": true, "reduce_bucket_size": 4.000000e+07, "allgather_bucket_size": 1.000000e+08, "load_from_fp32_weights": false, "round_robin_gradients": false }, "offload_optimizer": { "device": "cpu", "pin_memory": true }, "zero_allow_untested_optimizer": true, "bf16": { "enabled": true }, "activation_checkpointing": { "partition_activations": false, "contiguous_memory_optimization": false, "cpu_checkpointing": false }, "wall_clock_breakdown": false, "fp16": { "enabled": false } } INFO:main:Preparation done. Starting training...

0% 0/1120 [00:00<?, ?it/s]huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either:

Avoid using tokenizers before the fork if possible
Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either:
Avoid using tokenizers before the fork if possible
Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either:
Avoid using tokenizers before the fork if possible
Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either:
Avoid using tokenizers before the fork if possible
Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either:
Avoid using tokenizers before the fork if possible
Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either:
Avoid using tokenizers before the fork if possible
Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either:
Avoid using tokenizers before the fork if possible
Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either:
Avoid using tokenizers before the fork if possible
Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either:
Avoid using tokenizers before the fork if possible
Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either:
Avoid using tokenizers before the fork if possible
Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either:
Avoid using tokenizers before the fork if possible
Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either:
Avoid using tokenizers before the fork if possible
Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either:
Avoid using tokenizers before the fork if possible
Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either:
Avoid using tokenizers before the fork if possible
Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either:
Avoid using tokenizers before the fork if possible
Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either:
Avoid using tokenizers before the fork if possible
Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either:
Avoid using tokenizers before the fork if possible
Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either:
Avoid using tokenizers before the fork if possible
Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either:
Avoid using tokenizers before the fork if possible
Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either:
Avoid using tokenizers before the fork if possible
Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either:
Avoid using tokenizers before the fork if possible
Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either:
Avoid using tokenizers before the fork if possible
Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either:
Avoid using tokenizers before the fork if possible
Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either:
Avoid using tokenizers before the fork if possible
Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either:
Avoid using tokenizers before the fork if possible
Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either:
Avoid using tokenizers before the fork if possible
Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either:
Avoid using tokenizers before the fork if possible
Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either:
Avoid using tokenizers before the fork if possible
Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either:
Avoid using tokenizers before the fork if possible
Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either:
Avoid using tokenizers before the fork if possible
Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either:
Avoid using tokenizers before the fork if possible
Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either:
Avoid using tokenizers before the fork if possible
Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either:
Avoid using tokenizers before the fork if possible
Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)

0% 0/1120 [00:02<?, ?it/s] Traceback (most recent call last): File "/mnt/afs/share/csc/CogVLM2-main/finetune_demo/train.py", line 483, in main() File "/mnt/afs/share/csc/CogVLM2-main/finetune_demo/train.py", line 401, in main outputs = model( File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, kwargs) File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(*args, *kwargs) File "/opt/conda/lib/python3.8/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn ret_val = func(args, kwargs) File "/opt/conda/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 1855, in forward loss = self.module(*inputs, kwargs) File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, *kwargs) File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(args, kwargs) File "/opt/conda/lib/python3.8/site-packages/peft/peft_model.py", line 1129, in forward return self.base_model( File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, kwargs) File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(*args, *kwargs) File "/opt/conda/lib/python3.8/site-packages/peft/tuners/tuners_utils.py", line 161, in forward return self.model.forward(args, kwargs) File "/root/.cache/huggingface/modules/transformers_modules/modeling_cogvlm.py", line 620, in forward outputs = self.model( File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, kwargs) File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(*args, *kwargs) File "/root/.cache/huggingface/modules/transformers_modules/modeling_cogvlm.py", line 402, in forward return self.llm_forward( File "/root/.cache/huggingface/modules/transformers_modules/modeling_cogvlm.py", line 486, in llm_forward layer_outputs = decoder_layer( File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(args, kwargs) File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(*args, kwargs) File "/root/.cache/huggingface/modules/transformers_modules/modeling_cogvlm.py", line 261, in forward hidden_states, self_attn_weights, present_key_value = self.self_attn( File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, *kwargs) File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(args, kwargs) File "/root/.cache/huggingface/modules/transformers_modules/modeling_cogvlm.py", line 217, in forward context_layer = attention_fn( File "/root/.cache/huggingface/modules/transformers_modules/modeling_cogvlm.py", line 140, in attention_fn attention_scores = nn.functional.softmax(attention_scores, dim=-1, dtype=torch.float32).to(query_layer.dtype) File "/opt/conda/lib/python3.8/site-packages/torch/nn/functional.py", line 1860, in softmax ret = input.softmax(dim, dtype=dtype) torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 1.09 GiB. GPU 1 has a total capacity of 79.33 GiB of which 457.81 MiB is free. Process 701748 has 78.87 GiB memory in use. Of the allocated memory 75.91 GiB is allocated by PyTorch, and 1.73 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables) huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either:
Avoid using tokenizers before the fork if possible
Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)

0% 0/1120 [00:03<?, ?it/s] Traceback (most recent call last): File "/mnt/afs/share/csc/CogVLM2-main/finetune_demo/train.py", line 483, in main() File "/mnt/afs/share/csc/CogVLM2-main/finetune_demo/train.py", line 401, in main outputs = model( File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, kwargs) File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(*args, *kwargs) File "/opt/conda/lib/python3.8/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn ret_val = func(args, kwargs) File "/opt/conda/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 1855, in forward loss = self.module(*inputs, *kwargs) File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(args, **kwargs) File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl

0% 0/1120 [00:02<?, ?it/s] return forward_call(*args, kwargs) File "/opt/conda/lib/python3.8/site-packages/peft/peft_model.py", line 1129, in forward Traceback (most recent call last): File "/mnt/afs/share/csc/CogVLM2-main/finetune_demo/train.py", line 483, in return self.base_model( File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, *kwargs) File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(args, kwargs) File "/opt/conda/lib/python3.8/site-packages/peft/tuners/tuners_utils.py", line 161, in forward return self.model.forward(*args, kwargs) File "/root/.cache/huggingface/modules/transformers_modules/modeling_cogvlm.py", line 620, in forward outputs = self.model( File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, *kwargs) File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl main() File "/mnt/afs/share/csc/CogVLM2-main/finetune_demo/train.py", line 401, in main return forward_call(args, kwargs) File "/root/.cache/huggingface/modules/transformers_modules/modeling_cogvlm.py", line 402, in forward return self.llm_forward( File "/root/.cache/huggingface/modules/transformers_modules/modeling_cogvlm.py", line 486, in llm_forward layer_outputs = decoder_layer( File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, kwargs) File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(*args, *kwargs) File "/root/.cache/huggingface/modules/transformers_modules/modeling_cogvlm.py", line 261, in forward hidden_states, self_attn_weights, present_key_value = self.self_attn( File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl outputs = model( File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(args, kwargs) File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(*args, kwargs) File "/root/.cache/huggingface/modules/transformers_modules/modeling_cogvlm.py", line 217, in forward return self._call_impl(*args, *kwargs) File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl context_layer = attention_fn( File "/root/.cache/huggingface/modules/transformers_modules/modeling_cogvlm.py", line 140, in attention_fn attention_scores = nn.functional.softmax(attention_scores, dim=-1, dtype=torch.float32).to(query_layer.dtype) File "/opt/conda/lib/python3.8/site-packages/torch/nn/functional.py", line 1860, in softmax return forward_call(args, kwargs) File "/opt/conda/lib/python3.8/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn ret = input.softmax(dim, dtype=dtype)ret_val = func(*args, **kwargs)

File "/opt/conda/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 1855, in forward torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 1.09 GiB. GPU 4 has a total capacity of 79.33 GiB of which 457.81 MiB is free. Process 701751 has 78.87 GiB memory in use. Of the allocated memory 75.91 GiB is allocated by PyTorch, and 1.73 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables) loss = self.module(*inputs, kwargs) File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, *kwargs) File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(args, kwargs) File "/opt/conda/lib/python3.8/site-packages/peft/peft_model.py", line 1129, in forward return self.base_model( File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, kwargs) File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(*args, *kwargs) File "/opt/conda/lib/python3.8/site-packages/peft/tuners/tuners_utils.py", line 161, in forward return self.model.forward(args, kwargs) File "/root/.cache/huggingface/modules/transformers_modules/modeling_cogvlm.py", line 620, in forward outputs = self.model( File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, kwargs) File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(*args, *kwargs) File "/root/.cache/huggingface/modules/transformers_modules/modeling_cogvlm.py", line 402, in forward return self.llm_forward( File "/root/.cache/huggingface/modules/transformers_modules/modeling_cogvlm.py", line 486, in llm_forward layer_outputs = decoder_layer( File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(args, kwargs) File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(*args, kwargs) File "/root/.cache/huggingface/modules/transformers_modules/modeling_cogvlm.py", line 261, in forward hidden_states, self_attn_weights, present_key_value = self.self_attn( File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, *kwargs) File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(args, kwargs) File "/root/.cache/huggingface/modules/transformers_modules/modeling_cogvlm.py", line 217, in forward context_layer = attention_fn( File "/root/.cache/huggingface/modules/transformers_modules/modeling_cogvlm.py", line 140, in attention_fn attention_scores = nn.functional.softmax(attention_scores, dim=-1, dtype=torch.float32).to(query_layer.dtype) File "/opt/conda/lib/python3.8/site-packages/torch/nn/functional.py", line 1860, in softmax ret = input.softmax(dim, dtype=dtype) torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 1.09 GiB. GPU 0 has a total capacity of 79.33 GiB of which 553.81 MiB is free. Process 701747 has 78.78 GiB memory in use. Of the allocated memory 75.91 GiB is allocated by PyTorch, and 1.73 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables) huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either:
Avoid using tokenizers before the fork if possible
Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either:
Avoid using tokenizers before the fork if possible
Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)

Expected behavior / 期待表现

默认lora参数微调，按照作者说法75g显存 8卡可以微调，但是我用了一台机器，以及多台（4-6）机器都爆显存溢出，请问是哪里出的问题

THUDM / CogVLM2