Can we use image input with different resolutions?

Thank you for your feedback!

For fair comparison, our model was trained with 256 × 256 and 512 × 512 image resolutions for SD1.5 to stay consistent with other counterparts. We have reported these results in the paper to highlight the advantages of our dataset.

However, through our recent tests and experiments, we have discovered that SDXL and SD3 indeed require higher resolutions. Specifically, due to the architecture of the DIT, SD3 shows poor performance when it handles an input resolution different from what it was trained on. This indicates a lack of generalization ability for generating images at various resolutions. Therefore, a potential solution is to retrain the model with 1024 × 1024 images for SD3. The demo we shared includes SD3 trained with the UltraEdit dataset at 512 × 512 resolution.

In contrast, SDXL continues to perform well when generalizing to different resolutions. We train it with 512*512 resolution and it shows good results during the inference with the 1024 × 1024 images.

HaozheZhao / UltraEdit

Can we use image input with different resolutions? #11