brandontrabucco / da-fusion

Effective Data Augmentation With Diffusion Models
MIT License
217 stars 18 forks source link

Fine-Tuned Token Locations #1

Open MrWan001 opened 1 year ago

MrWan001 commented 1 year ago

DEFAULT_EMBED_PATH = "/root/downloads/da-fusion/dataset)-tokens/(dataset)-{seedJ-(examples_per_class}.pt"

Hello,the. pt file cannot be found. What effect does it have on the program?

brandontrabucco commented 1 year ago

Hello MrWan001,

DEFAULT_EMBED_PATH in the code points to the locations of fine-tuned tokens from text inversion.

We will be releasing the tokens for the three datasets we evaluated on shortly, which you can download and place in the location specified by DEFAULT_EMBED_PATH. Which datasets are you using, or are you using a custom dataset?

Best, Brandon

jlsaint commented 1 year ago

Hello @brandontrabucco ,

I'm dealing with a similar issue. Could you please explain the difference between DEFAULT_EMBED_PATH and DEFAULT_SYNTHETIC_DIR? That is, if the former is intended to point to the text inversion tokens as you've said, then what should the latter point to? Asking because intuitively those two variables seem to mean the same thing.

Also, I followed your instructions here and now have learned_embeds.bin for each COCO class (.bin files are in coco-#-#/{class}/ for every class). How should I format the relevant parameters if I want to run train_classifier.py? The given template for DEFAULT_EMBED_PATH seems class-agnostic...

Best, jl

brandontrabucco commented 1 year ago

Hello jlsaint,

Thanks for following up on this issue! The parameter DEFAULT_SYNTHETIC_DIR points to a location on the local disk of your machine where augmented images from Stable Diffusion will be saved for caching. This serves two roles:

First, caching the images to the disk means they don't have to be stored in memory, and for datasets with many images / classes, this can be crucial if there are too many augmented images. Note based on this point that we generated the augmented images only once at the beginning of training in our example training scripts using the train_dataset.generate_augmentations(num_synthetic) method. Here train_dataset is an instance of our FewShotDataset and num_synthetic is an integer that controls how many synthetic images we generate from Stable Diffusion for each real image. If you want to generate synthetic images more than once, you need only call generate_augmentations again later in the script.

Second, having the images cached means you can inspect the augmented images for tuning hyperparameters and confirming if the DA-Fusion is working as expected.

For your last point, take a look at this script: https://github.com/brandontrabucco/da-fusion/blob/main/aggregate_embeddings.py

After we do text inversion and have several class-specific tokens, we merge them together into a single dictionary containing all the tokens using the above script. This produces a class-agnostic template for the DEFAULT_EMBED_PATH.

Let me know if you have other questions!

Best, Brandon

jameelhassan commented 1 year ago

Hello @brandontrabucco, Is it possible to share the fine-tuned tokens from text-inversion for the three datasets?

I am hoping to run it for Imagenet. Thanks.

brandontrabucco commented 1 year ago

Sure! We have uploaded the current set of tokens here: https://drive.google.com/drive/folders/1JxPq05zy1_MGbmgHfVIeeFMjL56Cef53?usp=sharing

jameelhassan commented 1 year ago

Thank you very much.