bcmi / DCI-VTON-Virtual-Try-On

[ACM Multimedia 2023] Taming the Power of Diffusion Models for High-Quality Virtual Try-On with Appearance Flow.
https://arxiv.org/abs/2308.06101
MIT License
405 stars 61 forks source link

Queries Regarding Custom Dataset Testing in This Project #19

Closed winter-fish closed 1 year ago

winter-fish commented 1 year ago

Hello, I've completed the preprocessing steps for my custom dataset by HR-VITON/issues/45 , but I've encountered the following challenges:

  1. When I apply the warp module directly to my preprocessed custom dataset, I observe undesirable distortion results. Is it possible to achieve improved results by training the warp module specifically on my dataset?

  2. I've stored the data produced by the warp module within my customized dataset and tested the diffusion module. Surprisingly, I noticed that areas other than the clothing exhibit unfavorable redrawn effects. This outcome seems inconsistent with your paper, which suggests that only the clothing region should undergo redrawn effects. Could you please clarify the reason for this discrepancy, and can I enhance the results by training the model on my data?

  3. Lastly, the images generated by the diffusion module currently have a resolution of 512x384. I'm interested in generating larger images with a resolution of 1024x768. How can I achieve this?

Thank you for your assistance.

Limbor commented 1 year ago
  1. This may be because the model we provide is pre-trained on the VITON-HD dataset, so the custom data used during inference is best consistant with the data distribution in VITON-HD. For example, the person image contains the upper body and the clothes image contains flat clothes.
  2. For areas outside the inpainting maks, according to our observations in the experiment, it will bring certain distortion due to the VAE. In inference, we directly adopted the pasting back method to solve the distortion problem, but this problem can also be solved by increasing the resolution or using enhanced VAE.
  3. You can use 1024x768 data and follow our process to retrain the two modules.

Hope this helps you!