Open nicole828 opened 1 year ago
你好,使用命令进行单机多卡训练的时候报错。命令如下: CUDA_VISIBLE_DEVICES=0,1,2,3 python src/train_bash.py \ --stage sft \ --model_name_or_path path_to_your_model \ --do_train \ --dataset alpaca_gpt4_zh \ --template chatml \ --finetuning_type lora \ --output_dir path_to_sft_checkpoint \ --overwrite_cache \ --per_device_train_batch_size 4 \ --gradient_accumulation_steps 4 \ --lr_scheduler_type cosine \ --logging_steps 10 \ --save_steps 1000 \ --learning_rate 5e-5 \ --num_train_epochs 3.0 \ --plot_loss \ --lora_target c_attn \ --fp16
报错如下:
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1! (when checking argument for argument index in method wrapper_CUDA__index_select)
请问,支持单机多卡训练吗?
试试把命令换成shell脚本
改成shell脚本试了呢,还是同样的错误。
应该是代码里有错误
考虑在AutoModelForCausalLM.from_pretrained内指定参数device_map=“auto”。
你好,使用命令进行单机多卡训练的时候报错。命令如下: CUDA_VISIBLE_DEVICES=0,1,2,3 python src/train_bash.py \ --stage sft \ --model_name_or_path path_to_your_model \ --do_train \ --dataset alpaca_gpt4_zh \ --template chatml \ --finetuning_type lora \ --output_dir path_to_sft_checkpoint \ --overwrite_cache \ --per_device_train_batch_size 4 \ --gradient_accumulation_steps 4 \ --lr_scheduler_type cosine \ --logging_steps 10 \ --save_steps 1000 \ --learning_rate 5e-5 \ --num_train_epochs 3.0 \ --plot_loss \ --lora_target c_attn \ --fp16
报错如下:
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1! (when checking argument for argument index in method wrapper_CUDA__index_select)
请问,支持单机多卡训练吗?