chs20 / RobustVLM

[ICML 2024] Unsupervised Adversarial Fine-Tuning of Vision Embeddings for Robust Large Vision-Language Models
MIT License
67 stars 3 forks source link

CLIP image generation #10

Open realfolkcode opened 3 weeks ago

realfolkcode commented 3 weeks ago

Hi! Thank you for your work and congratulations on ICML acceptance! I am not sure if the issues is appropriate place but I wanted to share some of my findings regarding image generation with guidance from the gradients of Robust CLIP. Here I ran some experiments, and the results seem to be decent, which indicates that the gradients are perceptually-aligned. I thought you might find this interesting.

chs20 commented 3 weeks ago

Hi, Thanks for sharing. The generated images look very nice! We have also looked a bit into the interpretability of adversarial perturbations for robust CLIP as part of another project, which should be on arxiv soon :)

realfolkcode commented 3 weeks ago

Great! Looking forward to that paper!