cuiaiyu / dressing-in-order

(ICCV'21) Official code of "Dressing in Order: Recurrent Person Image Generation for Pose Transfer, Virtual Try-on and Outfit Editing" by Aiyu Cui, Daniel McKee and Svetlana Lazebnik
https://cuiaiyu.github.io/dressing-in-order
Other
507 stars 126 forks source link

Number of output patches and the generated image quality #44

Closed vivekjames closed 1 year ago

vivekjames commented 2 years ago

Hi there,

One thing I have observed when testing with images that look significantly different from those in the DeepFashion dataset is: at times the generated results look blurry and lose some high-frequency details.

I was thinking about increasing the number of discriminator output patches (ndf in the code) to potentially improve the image quality, particularly to preserve more high-frequency details.

Do you by chance have any insights on the relationship between the number of output patches and the generated image quality? I would like to know if there is a point where ndf is too high and introduces diminishing returns or even undesirable effects during training (as opposed to ndf=32 by default).

Thank you for your time, Vivek

cuiaiyu commented 2 years ago

Hi!

If the "difference" means the face in the phrase "the test image that look significantly different from the deepfashion image", I think increasing the discriminator won't change the result too much, because the face overfitting is more a result of lack of diversity in training data.

If the "different" means garment details, maybe try to stop the training early like around 120k iterations. The released checkpoints seem a little overfitting for some tasks (e.g. try-on), stop the training early won't make the result visually too different but may give more robust performance on outlier data, although some metrics scores may drop a little for pose transfer.

Besides, if virtual try-on is the only task that interests you (i.e. pose transfer does not matter to you at all), it would also help by increasing the ratio of inpainting rate (e.g. from --random_rate 0.8 to --random_rate 0.5). The inpainting training task is one of the key to give better details reconstruction on garments for both shape and texture.

For discriminator behaviors, we are almost directly inherent the hyperparameters from our prior work GFLA, so maybe GFLA's authors would be more qualified for this question. :)

Djvnit commented 2 years ago

Thanks a lot @cuiaiyu for your great open source contribution. Is it possible to apply transfer learning for the GAN? I'm still unclear about the GAN used in this model to generate the body and facial characteristics, @cuiaiyu , Can you please help me with this? As I have referenced your work in my Academic Project, the final presentation in near and we need to show the demo on our own image. We tried the features are working at par but the facial features are totally changed. Have attached some for the reference.

  1. Pose Transfer image

  2. Tuck-in & Tuck-out image