OpenMOSS / MOSS

An open-source tool-augmented conversational language model from Fudan University
https://txsun1997.github.io/blogs/moss.html
Apache License 2.0
11.9k stars 1.14k forks source link

微调如何指定GPU #286

Closed lukaswangbk closed 1 year ago

lukaswangbk commented 1 year ago

测试微调的时候发现GPU永远会用第一张卡,请问要怎么指定用那几张卡呢? 目前尝试了在run.sh中加入CUDA_VISIBLE_DEVICES=1,但是还是第0张卡,在fine_tuning.py中指定os也是不行 以下是我的命令: num_machines=1 num_processes=1 machine_rank=0 CUDA_VISIBLE_DEVICES=3 accelerate launch \ --config_file ./configs/sft.yaml \ --num_processes $num_processes \ --num_machines $num_machines \ --machine_rank $machine_rank \ --deepspeed_multinode_launcher standard finetune_moss.py \ --model_name_or_path fnlp/moss-moon-003-base \ --data_dir ./sft_data \ --output_dir ./ckpts/moss-moon-003-sft \ --log_dir ./train_logs/moss-moon-003-sft \ --n_epochs 2 \ --train_bsz_per_gpu 1 \ --eval_bsz_per_gpu 1 \ --learning_rate 0.000015 \ --eval_step 200 \ --save_step 2000

631068264 commented 1 year ago

@lukaswangbk 大佬解决了吗 求救

lukaswangbk commented 1 year ago

需要在文件最上面import os然后更改CUDA_VISIBLE_DEVICES(理论上是要在import torch之前)

631068264 commented 1 year ago

@lukaswangbk 你代码是最新的git commit hash是 4ab9c787 ?

我有4张卡,1卡是空闲的还是会莫名奇妙跑到0卡

image image

不可以。。。

graphnj commented 1 year ago

CUDA_VISIBLE_DEVICES=3前面加export
或将 CUDA_VISIBLE_DEVICES=3与accelerate launch放一行 CUDA_VISIBLE_DEVICES=3 accelerate launch 试下