THUDM / VisualGLM-6B

Chinese and English multimodal conversational language model | 多模态中英双语对话语言模型
Apache License 2.0
4.07k stars 414 forks source link

单机多卡微调 进程被killed #290

Closed abbhay closed 10 months ago

abbhay commented 10 months ago

f85c2c77db46c566f79a173f134080c 9b226ef0a488c3214cbe003814d691e 使用2个V100进行单机多卡微调的时候出现了进程被杀 旨在sh代码里面加了 --include localhost:0,1 请问大佬这种是什么原因列

1049451037 commented 10 months ago

应该是你的cpu内存不够导致进程被kill了,尝试安装github最新版SAT:

git clone https://github.com/THUDM/SwissArmyTransformer
cd SwissArmyTransformer
pip install .

新版对cpu内存进行了优化,在你的程序里from_pretrained函数加入from_pretrained(..., overwrite_args={'model_parallel_size': 1})即可使用cpu内存优化模式。

1049451037 commented 10 months ago

FYI: 进一步优化内存,把args.device = 'cpu'改成args.device = 'cuda'