Open wubangcai opened 2 months ago
I found that in the file LLaMA-Factory/src/llamafactory/data/mm_plugin.py, there is a function _regularize_images
which loads image_resolution: int = getattr(processor, "image_resolution", 512)
. However, in the processor for Qwen2-VL, there is no image_resolution
attribute, so it always defaults to 512. I'm not sure if the author intended to use this setting for training or not. We need confirmation on this.
Same question, I tried to SFT with LLamA-Factory's example settings on 1080p images directly, but encountered RuntimeError: shape mismatch. I suspect it's because I didn't set the image resolution.
We have image_resolution
argument to control the maximum width or height of input images. Use --image_resolution 1024
to specify it.
@hiyouga Thank you for your reply. I have checked the relevant resolution settings. However, a more confusing issue is that Qwen2-VL is supposed to support dynamic resolution, I think the training is to scale the original image to less than the maximum support resolution, rather than a definite resolution. But at present I do not see the relevant operation in the code, of course, it is possible that I did not find, hope the author can clarify.
@wubangcai We also support dynamic resolution. We only resize the image if its width or height exceeds the image_resolution
parameter.
@hiyouga Thank you, I get it. @huynhbaobk Thanks again for your reply. We can set a larger pixel, to avoid the resize, and then the function: smart_resiz
intransformers/models/qwen2_vl/image_processing_qwen2_vl.py
will do dynamic scaling.
In the README file, I only found instructions on how to set the image size during inference, but how do I set the image resolution during SFT with LLamA-Factory?