多图推理OOM - Githubissues

QwenLM / Qwen2-VL

Qwen2-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.

Apache License 2.0

2.14k stars 123 forks source link

多图推理OOM #185

Open whatever-wlb opened 1 week ago

whatever-wlb commented 1 week ago

当进行多图推理时，会爆显存（单卡，32G显存）提升使用的推理卡数量（2卡，4卡均有尝试），依然爆显存，请问为什么会出现该问题

jklj077 commented 1 week ago

Hi, we need more information to understand your situation:

which framework were you using? transformers or vllm? was flash_attention used?
how did you run the inference? what resolutions were the images? did you try lowering max_pixels ($$x$$ pixels = $$x/784$$ tokens) as in the README?

whatever-wlb commented 1 week ago

model: Qwen2-VL-7B-Instruct framework: transformers no flash_attention how did you run the inference? : use demo code in readme resolutions: [1436, 717] or higher did you try lowering max_pixels (x pixels = x/784 tokens) as in the README? no, I'm worried that I'll lose image details this way

sharonsalabiglossai commented 5 days ago

@jklj077 where did you see that (x pixels = x/784 tokens)? can we control it during finetuning will llama factory?