ShihaoZhaoZSH / Uni-ControlNet

[NeurIPS 2023] Uni-ControlNet: All-in-One Control to Text-to-Image Diffusion Models
MIT License
574 stars 41 forks source link

Technical questions about training #13

Closed tomtom1103 closed 1 year ago

tomtom1103 commented 1 year ago

Hello, thank you for your work.

I have a technical question on how the model was trained: the paper mentions that you used a subset of the LAION dataset, and I presume that the local conditions were either extracted on the fly or extracted before training. My question is, a lot of samples in the LAION dataset do not contain people, e.g. do not have an openpose condition to be extracted. Did you filter out LAION images that don't contain people so that each image definitely has an openpose condition? or were random images used so that some images don't have an openpose condition?

Kudos to your work, I know how techinically frustrating it is to work with such a large dataset.

ShihaoZhaoZSH commented 1 year ago

Since all the conditions are trained simultaneously, we did not filter out any data. However, we did set the dropout probability of the openpose condition to 0 during training to ensure that this condition is fully trained. This is because, as you mentioned, some images may not have the openpose condition.

tomtom1103 commented 1 year ago

thank you!