QwenLM / Qwen2-VL

Qwen2-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
Apache License 2.0
2.14k stars 123 forks source link

多图推理OOM #185

Open whatever-wlb opened 1 week ago

whatever-wlb commented 1 week ago

当进行多图推理时,会爆显存(单卡,32G显存) 提升使用的推理卡数量(2卡,4卡均有尝试),依然爆显存,请问为什么会出现该问题

jklj077 commented 1 week ago

Hi, we need more information to understand your situation:

whatever-wlb commented 1 week ago

model: Qwen2-VL-7B-Instruct framework: transformers no flash_attention how did you run the inference? : use demo code in readme resolutions: [1436, 717] or higher did you try lowering max_pixels (x pixels = x/784 tokens) as in the README? no, I'm worried that I'll lose image details this way

sharonsalabiglossai commented 5 days ago

@jklj077 where did you see that (x pixels = x/784 tokens)? can we control it during finetuning will llama factory?