the model is image2image or text2image

brandontrabucco / da-fusion

Effective Data Augmentation With Diffusion Models

MIT License

221 stars 18 forks source link

the model is image2image or text2image #6

Open leangovern opened 1 year ago

leangovern commented 1 year ago

hi,i am reading the code.paper shows that the model uses img2img and textual inversion to generate data.but i did not find a img2img model in this code,only found a txt2img model.whether it is that i missed it.I am confused,hoping for your reply!

brandontrabucco commented 1 year ago

Hello leangovern,

Thanks for your interest in our code! The underlying Stable Diffusion model is used as a [txt+img]2img model in our codebase. That is, given a real image to use as a guide, (1) we first add a small amount of random noise to that real image, and (2) we use Stable Diffusion to denoise that image, conditioned on a text prompt.

DA-Fusion uses Textual Inversion to learn the text prompt, but the prompt can also be prompt engineered if desired.

The image input helps to preserve the structure of the real image, and the text prompt input allows you to specify how you want the image to be augmented.

This class (https://github.com/brandontrabucco/da-fusion/blob/main/semantic_aug/augmentations/textual_inversion.py#L86) has the relevant code if you are interested in the implementation details.

Let me know how I can help if you have other questions!

-Brandon

leangovern commented 1 year ago

Hello Brandon

Thank you very much for your detailed reply, which is of great help to me. Thank you very much for your work, which is a great inspiration to me. Thank you very much, I will continue to read your code and apply it to my work. Thank you very much.

-Govern

leangovern commented 1 year ago

Hi Brandon,

I am back again.my english is poor,so i will try my best to express myself.i have a question in your paper.you mentioned that stable diffusion may leak internet images when generate pictures, if we use it to generate synthetic data may cause a diemma that the synthetic data is with great effect while it may be unfair.because stable diffusion is trained under huge number of images.so we should erase the concept of the class we will generate, to ensure that we use the generative ability of SD instead of the extra training data.hoping for your reply!

-Govern with repect

brandontrabucco commented 1 year ago

Hi again,

You got it! Though in downstream applications, the model leaking internet images is not usually a big issue. We evaluated the model in our paper this way because we are interested in studying how the model generalizes to novel classes.

-Brandon

leangovern commented 1 year ago

thank you brandon，your work is so inspiring!

JiaojiaoYe1994 commented 5 months ago

Thank you for your work! I am curious if there is quantitative implementation of synthetic images in this work?