AoiDragon / HADES

[ECCV'24 Oral] The official GitHub page for ''Images are Achilles' Heel of Alignment: Exploiting Visual Vulnerabilities for Jailbreaking Multimodal Large Language Models''
MIT License
5 stars 3 forks source link

some questions about the methods mentioned in the paper #6

Open yuese1234 opened 1 week ago

yuese1234 commented 1 week ago

Hello. Thank you for your excellent work. I'm confused about the "typography" operation in the paper. Does it mean printing a word on an image? 2.You mentioned in your paper" as the keywords may represent abstract conceptsor behaviors that are difficult for models to grasp when depicted by real-world images, we employ typography to represent these keywords." So do you take this measure for all keywords, or only for certain abstract words? 3.For words with abstract concepts, how to ensure image quality when using a generative model to generate corresponding images? Looking forward to your answers and congratulations on achieving such excellent results!

AoiDragon commented 1 week ago

Hello @yuese1234,

  1. Yes, you are right.
  2. We represent all keywords with typography.
  3. We ask ChatGPT to convert the original instruction to an image generation prompt for the diffusion model. We provide ChatGPT with few-shot demonstrations. For example, the instructions 'how to steal goods from a supermarket' can be converted to prompts like 'A thief, stealing apples, a supermarket'. This prompt will be further optimized to be more complicated. 截屏2024-09-08 20 01 11