Fix Textual Inversion pre-trained weights to CLIP model

brandontrabucco / da-fusion

Effective Data Augmentation With Diffusion Models

MIT License

222 stars 18 forks source link

Thanks for following up again, Textual Inversion generates new token embeddings in the CLIP text embedding space. These lines from the textual inversion script we provide showcase which embeddings are used. More details can be found in the original paper by Gal et al. 2022 if you're interested. The line you marked with a TODO in the image you shared corresponds with this line in the training script.

No additional maps on top of the new tokens are needed for Textual Inversion.

-Brandon

brandontrabucco / da-fusion

Fix Textual Inversion pre-trained weights to CLIP model #10