Open whatever-wlb opened 1 week ago
Hi, we need more information to understand your situation:
model: Qwen2-VL-7B-Instruct framework: transformers no flash_attention how did you run the inference? : use demo code in readme resolutions: [1436, 717] or higher did you try lowering max_pixels (x pixels = x/784 tokens) as in the README? no, I'm worried that I'll lose image details this way
@jklj077 where did you see that (x pixels = x/784 tokens)? can we control it during finetuning will llama factory?
当进行多图推理时,会爆显存(单卡,32G显存) 提升使用的推理卡数量(2卡,4卡均有尝试),依然爆显存,请问为什么会出现该问题