Closed chenjingcheng closed 1 year ago
Global conditions are the clip embeddings extracted by the ContentDetector in /annotator/content/. That means during training, the input global condition is the clip embedding, and the output (or ground truth) should be the corresponding original image.
Thank you very much for your reply. Am I understanding this right? The data (*.npy) of the content comes from the feature data extracted from the pictures in the images directory.
Yes, you are correct.
thanks very much!
Hi, good work! I want to confirm whether the global conditions' clip embeddings are generated by raw rgb images or single style condition images(i.e. depth/pose/etc)?
Thank you very much for your work. I have a question: local training images are easier to understand and create. However, what is the relationship between the images of the content trained globally and the corresponding images? How do you select the training images? I am using my own industry-specific data for training. Thank you.