ShihaoZhaoZSH / Uni-ControlNet

[NeurIPS 2023] Uni-ControlNet: All-in-One Control to Text-to-Image Diffusion Models
MIT License
574 stars 41 forks source link

question: about global content training images #14

Closed chenjingcheng closed 1 year ago

chenjingcheng commented 1 year ago

Thank you very much for your work. I have a question: local training images are easier to understand and create. However, what is the relationship between the images of the content trained globally and the corresponding images? How do you select the training images? I am using my own industry-specific data for training. Thank you.

ShihaoZhaoZSH commented 1 year ago

Global conditions are the clip embeddings extracted by the ContentDetector in /annotator/content/. That means during training, the input global condition is the clip embedding, and the output (or ground truth) should be the corresponding original image.

chenjingcheng commented 1 year ago

Thank you very much for your reply. Am I understanding this right? The data (*.npy) of the content comes from the feature data extracted from the pictures in the images directory.

ShihaoZhaoZSH commented 1 year ago

Yes, you are correct.

chenjingcheng commented 1 year ago

thanks very much!

xXuHaiyang commented 10 months ago

Hi, good work! I want to confirm whether the global conditions' clip embeddings are generated by raw rgb images or single style condition images(i.e. depth/pose/etc)?