QwenLM / Qwen2-VL

Qwen2-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
Apache License 2.0
2.98k stars 175 forks source link

Qwen2-VL-72B-Instruct在docker镜像qwenllm/qwenvl不支持pipeline parallelism #261

Open db24 opened 1 month ago

db24 commented 1 month ago

手里没有4X80G的卡, 在4X40G的卡环境中Qwen2-VL-72B-Instruct显存不够, 通过多node用模型流水来部署, 但是vLLM中不支持, 这个咱们后续后可能支持吗

python3 -m vllm.entrypoints.openai.api_server --port 8000 --model /llm_weights/Qwen2-VL-72B-Instruct --pipeline-parallel-size 2 --tensor-parallel-size 4 --swap-space 16 --gpu-memory-utilization 0.9 --dtype auto --served-model-name Qwen2-VL-72B-Instruct
NotImplementedError: Pipeline parallelism is only supported for the following  architectures: ['AquilaForCausalLM', 'AquilaModel', 'DeepseekV2ForCausalLM', 'GPT2LMHeadModel', 'InternLM2ForCausalLM', 'InternLMForCausalLM', 'InternVLChatModel', 'JAISLMHeadModel', 'LlamaForCausalLM', 'LLaMAForCausalLM', 'MistralForCausalLM', 'MixtralForCausalLM', 'NemotronForCausalLM', 'Phi3ForCausalLM', 'Qwen2ForCausalLM', 'Qwen2MoeForCausalLM', 'QWenLMHeadModel']
db24 commented 1 month ago

之前LLAMA 405B多节点可以deploy, 应该是vLLM支持吧 很多单位确实没有4/8X80G的卡, 所以multi-node躲不开