YanzuoLu / CFLD

[CVPR 2024 Highlight] Coarse-to-Fine Latent Diffusion for Pose-Guided Person Image Synthesis
MIT License
183 stars 12 forks source link

why not generate 750*1024 ims like origin imgs in Deepfashion dataset? #5

Closed LeonJoe13 closed 8 months ago

LeonJoe13 commented 8 months ago

Thank you for your wonderful efforts in this work, I just wondering why not generate the same resolution like origin imgs in Deepfashion dataset, why choose lower resolution?

YanzuoLu commented 8 months ago

Thanks for your attention to our work. For one thing, we adopt SD1.5 so the generation is fixed on 512x512. The evaluation on different resolution is achieved by resizing. For another thing, of course that we can achieve 1024x750 generation or different aspect ratio with SDXL. But the evaluation with other state-of-the-arts can be not fair in the paper. As a open-sourced project, I do think that your suggestion is reasonable to set up a new standard. We may plan to make this happen : ) Thanks!

LeonJoe13 commented 8 months ago

many thank for your reply, do you try train in higher resolution, does it better?

YanzuoLu commented 8 months ago

Sorry that we haven't give it a try. If we want to generate higher resolution images, we might also need to increase the capacity of the feature network I guessed. Or maybe the current 256x256 swin-transformer is also okay. Anyway I have no idea but to experiment with it planned in the near future. Thanks : )