some questions about the methods mentioned in the paper

AoiDragon / HADES

[ECCV'24 Oral] The official GitHub page for ''Images are Achilles' Heel of Alignment: Exploiting Visual Vulnerabilities for Jailbreaking Multimodal Large Language Models''

MIT License

5 stars 3 forks source link

Hello. Thank you for your excellent work. I'm confused about the "typography" operation in the paper. Does it mean printing a word on an image? 2.You mentioned in your paper" as the keywords may represent abstract conceptsor behaviors that are difficult for models to grasp when depicted by real-world images, we employ typography to represent these keywords." So do you take this measure for all keywords, or only for certain abstract words? 3.For words with abstract concepts, how to ensure image quality when using a generative model to generate corresponding images? Looking forward to your answers and congratulations on achieving such excellent results!

AoiDragon / HADES

some questions about the methods mentioned in the paper #6