aihao2000 / stable-diffusion-reference-only

img2img version of stable diffusion. Anime Character Remix. Line Art Automatic Coloring. Style Transfer.
Apache License 2.0
132 stars 8 forks source link

few questions. #7

Open yejy53 opened 9 months ago

yejy53 commented 9 months ago

Thank you for your great work, but I have a few questions. 1. If I need to train on a new data set, how should I set the data set format? 2. Compared with the original controlnet, is it just a matter of replacing the original text encoder with image VAE input?

aihao2000 commented 9 months ago

@yejy53 Training data can be constructed using huggingface datasets. Each sample should contain three data columns, blueprint (line drawing), image pomrpt, image (the image expected to be generated). The training part of the readme should be introduced. The reference image, that is, the image prompt, is encoded by VIT and used for cross attention, and VAE is not used. blueprint is injected into UNet through additional convolutional layers. The input of VAE has not been replaced and is still the image expected to be generated.

yejy53 commented 9 months ago

Thank you for your outstanding contributions. Could you kindly provide your email address? I have several specific inquiries that require your insight.

aihao2000 commented 9 months ago

@yejy53 Of course, aihao2000@outlook.com