OpenGVLab / InternVL

[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型
https://internvl.readthedocs.io/en/latest/
MIT License
5.48k stars 425 forks source link

Pipeline Parrallelism for Internvl chat 26b model #524

Closed Dineshkumar-Anandan-ZS0367 closed 1 week ago

Dineshkumar-Anandan-ZS0367 commented 3 weeks ago

I have 4 machines, each machines had 16GB gpu. I need to split the model to load and inference across multiple machines. Is it possible. Already you are supporting single node multi gpu, I need to achieve multiple node. Is it possible?

I was try to deploy using vLLM, But i have an error.

If I am not wrong, currently vllm supports only the Language models not the Vision models.

NotImplementedError: Pipeline parallelism is only supported for the following architectures: ['AquilaModel', 'AquilaForCausalLM', 'DeepseekV2ForCausalLM', 'InternLMForCausalLM', 'JAISLMHeadModel', 'LlamaForCausalLM', 'LLaMAForCausalLM', 'MistralForCausalLM', 'Phi3ForCausalLM', 'GPT2LMHeadModel', 'MixtralForCausalLM', 'NemotronForCausalLM', 'Qwen2ForCausalLM', 'Qwen2MoeForCausalLM', 'QWenLMHeadModel'].

This feature would greatly benefit teams and projects working with vision-language models, allowing them to scale out their workloads efficiently and maintain performance as model sizes continue to grow.

Also It would be greatly helpful, if someone can point me out on other possibilities for pipeline parallelism. Thanks in advance

G-z-w commented 2 weeks ago

Internvl is currently supported in the vllm framework. See https://github.com/vllm-project/vllm/releases/tag/v0.5.4