Open ybshaw opened 1 month ago
同样的问题,4卡3090,example只能单卡,finetune单卡爆显存,多卡报错ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: -9) local_rank: 2 (pid: 15250) of binary: /opt/conda/envs/internlm/bin/python
model = AutoModel.from_pretrained(
'internlm/internlm-xcomposer2-vl-7b',
trust_remote_code=True,
torch_dtype=torch.bfloat16,
low_cpu_mem_usage=True,
device_map="auto"
).eval()
successful loaded on 2*3090
使用官方提供的7B版本,单卡24G内存的RTX上无法运行,报OOM错误,指定卡号后无法生效,依然还是只占用第0卡,要怎么推理才可以正常运行
报错:OOM错误![tmp](https://github.com/InternLM/InternLM-XComposer/assets/52484098/1be54648-609e-498d-8647-43589901887b)
代码中指定所有卡号(机器信息:4卡,每张24G内存)
还是一样的错误,查看nvidia-smi发现实际还是跑在一张卡上,没有分布到其余卡上