Error in inference.py when multiple GPUs are available. [BUG]

cambrian-mllm / cambrian

Cambrian-1 is a family of multimodal LLMs with a vision-centric design.

https://cambrian-mllm.github.io/

Apache License 2.0

1.4k stars 88 forks source link

Error in inference.py when multiple GPUs are available. [BUG] #25

Open ZeenSong opened 2 days ago

ZeenSong commented 2 days ago

My server has 8 GPUs and when running

python inference.py

It can load all models, but when input with image and question it raises an error with:

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:1 and cuda:0! (when checking argument for argument weight in method wrapper_CUDA__native_layer_norm)

But when I specify only one GPU with

CUDA_VISIBLE_DEVICES=0 python inference.py

It works well.

Does this script only work with a single GPU?

ZeenSong commented 2 days ago

This also applied to the case when I launched the model worker. When I uploaded the images, the same error occurs, saying that the image and model are on the different devices.

penghao-wu commented 1 day ago

Yes, the current script only works with a single GPU for now. As the device_map is set to auto by default, it will try to split the weight when multiple GPUs are available. The reason for this error and a temporary workaround are provided here https://github.com/cambrian-mllm/cambrian/issues/12#issuecomment-2198610750.

ZeenSong commented 1 day ago

Thank you for your quick response. So if I have a different number of GPUs than in #12 , should I only modify the beginning of cambrian/model/builder.py to my GPU setting?

penghao-wu commented 1 day ago

Yes. I just put all vision encoders on cuda:0 and reserve some memory for cuda:0 in device_map, so you have to make sure the memory reserved for cuda:0 is enough in your case.