ArtificialZeng / Qwen-Tuning

Qwen-Efficient-Tuning
Apache License 2.0
42 stars 6 forks source link

单机多卡训练报错 #1

Open nicole828 opened 1 year ago

nicole828 commented 1 year ago

你好,使用命令进行单机多卡训练的时候报错。命令如下: CUDA_VISIBLE_DEVICES=0,1,2,3 python src/train_bash.py \ --stage sft \ --model_name_or_path path_to_your_model \ --do_train \ --dataset alpaca_gpt4_zh \ --template chatml \ --finetuning_type lora \ --output_dir path_to_sft_checkpoint \ --overwrite_cache \ --per_device_train_batch_size 4 \ --gradient_accumulation_steps 4 \ --lr_scheduler_type cosine \ --logging_steps 10 \ --save_steps 1000 \ --learning_rate 5e-5 \ --num_train_epochs 3.0 \ --plot_loss \ --lora_target c_attn \ --fp16

报错如下:

image

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1! (when checking argument for argument index in method wrapper_CUDA__index_select)

请问,支持单机多卡训练吗?

ArtificialZeng commented 1 year ago

试试把命令换成shell脚本

nicole828 commented 1 year ago

改成shell脚本试了呢,还是同样的错误。

JasonFuuuuuuuu commented 1 year ago

试试把命令换成shell脚本

应该是代码里有错误

Ouya-Bytes commented 8 months ago

考虑在AutoModelForCausalLM.from_pretrained内指定参数device_map=“auto”。