Open leangovern opened 1 year ago
Hello leangovern,
Thanks for your interest in our code! The underlying Stable Diffusion model is used as a [txt+img]2img model in our codebase. That is, given a real image to use as a guide, (1) we first add a small amount of random noise to that real image, and (2) we use Stable Diffusion to denoise that image, conditioned on a text prompt.
DA-Fusion uses Textual Inversion to learn the text prompt, but the prompt can also be prompt engineered if desired.
The image input helps to preserve the structure of the real image, and the text prompt input allows you to specify how you want the image to be augmented.
This class (https://github.com/brandontrabucco/da-fusion/blob/main/semantic_aug/augmentations/textual_inversion.py#L86) has the relevant code if you are interested in the implementation details.
Let me know how I can help if you have other questions!
-Brandon
Hello Brandon
Thank you very much for your detailed reply, which is of great help to me. Thank you very much for your work, which is a great inspiration to me. Thank you very much, I will continue to read your code and apply it to my work. Thank you very much.
-Govern
Hi Brandon,
I am back again.my english is poor,so i will try my best to express myself.i have a question in your paper.you mentioned that stable diffusion may leak internet images when generate pictures, if we use it to generate synthetic data may cause a diemma that the synthetic data is with great effect while it may be unfair.because stable diffusion is trained under huge number of images.so we should erase the concept of the class we will generate, to ensure that we use the generative ability of SD instead of the extra training data.hoping for your reply!
-Govern with repect
Hi again,
You got it! Though in downstream applications, the model leaking internet images is not usually a big issue. We evaluated the model in our paper this way because we are interested in studying how the model generalizes to novel classes.
-Brandon
thank you brandon,your work is so inspiring!
Thank you for your work! I am curious if there is quantitative implementation of synthetic images in this work?
hi,i am reading the code.paper shows that the model uses img2img and textual inversion to generate data.but i did not find a img2img model in this code,only found a txt2img model.whether it is that i missed it.I am confused,hoping for your reply!