About the training of input embedding.

adobe-research / custom-diffusion

Custom Diffusion: Multi-Concept Customization of Text-to-Image Diffusion (CVPR 2023)

https://www.cs.cmu.edu/~custom-diffusion

Other

1.87k stars 139 forks source link

About the training of input embedding. #52

Closed TousakaNagio closed 1 year ago

TousakaNagio commented 1 year ago

In the paper "Multi-Concept Customization of Text-to-Image Diffusion," it is stated that only the target token "new1" or V* was tuned. However, upon reviewing the code in diffuser_training.py, it appears that the entire embedding was optimized. Can you clarify if this is accurate? Thank you for your response.

nupurkmr9 commented 1 year ago

Hi, in the case of diffusers based code, we keep all embeddings trainable but manually set the gradient of all other embeddings except \<new1> token (denoted as V* in the paper) to 0 here. Thanks!

TousakaNagio commented 1 year ago

I got it. Thank you for your respond!

TousakaNagio commented 1 year ago

Another question. Would you please indicate the code which corresponding to here in train.py/custom_modules.py/model.py? Thanks!

nupurkmr9 commented 1 year ago

In the case of model.py, I simply detached all other embeddings here instead of zeroing out the gradients. We can also zero out the gradients by defining a callback function which I tried in one of our new projects here

TousakaNagio commented 1 year ago

Thank you for your detailed explanation. It has been very helpful for my research.