Unispac / Visual-Adversarial-Examples-Jailbreak-Large-Language-Models

Repository for the Paper (AAAI 2024, Oral) --- Visual Adversarial Examples Jailbreak Large Language Models
156 stars 12 forks source link

computing gradient #26

Open ChoiDae1 opened 3 weeks ago

ChoiDae1 commented 3 weeks ago

Hello. I'm interested in your work and have a question about your implementation. When implementing PGD, I noticed that your code uses only the gradient sign, not the gradient value, as shown in the following line: adv_noise.data = (adv_noise.data - alpha * adv_noise.grad.detach().sign()).clamp(-epsilon, epsilon) in visual_attacker.py Is there a specific reason for this?

Unispac commented 3 weeks ago

Hi, this is a commonly used trick back to https://arxiv.org/abs/1412.6572 My experience is that gradient sign descent usually gives better results, less likely to be stuck at a local optimal.