Closed pyogher closed 1 year ago
Hi, thanks for having interest in our work. Currently we do not ensure the multiple gpu support. For enabling the multi-gpu inference, we recommand to setup each GPU with an individual model. And you can self-design a inference scheduling program to do this.
Thank you for your response. I really appreciate your help.
Thank you for your response. I really appreciate your help.
could you share your solution?
Hi!
I noticed a warning in the mPLUG_OwlForConditionalGeneration class:
"The language_model is not in the hf_device_map dictionary and you are running your script" " in a multi-GPU environment. This may lead to unexpected behavior when using accelerate." " Please pass a device_map that contains language_model to remove this warning." " Please refer to https://github.com/huggingface/blog/blob/main/accelerate-large-models.md for" " more details on creating a device_map for large models."
It is evident that the suggestion is against splitting every layer of LLaMA onto different GPUs. I'm curious about the reason behind this recommendation. Additionally, I have been exploring the evaluation of LVLMs recently and I need to perform extensive inference in different benchmarks. Do you have any suggestions to improve the efficiency of inference on mPLUG-Owl in a multi-GPU environment? I've noticed that the GPU utilization is not optimal. Thank you!