Unispac / Visual-Adversarial-Examples-Jailbreak-Large-Language-Models

Repository for the Paper (AAAI 2024, Oral) --- Visual Adversarial Examples Jailbreak Large Language Models
136 stars 10 forks source link

Issues while trying to reproduce the results on LLaVA-v1.5 #9

Open simplelifetime opened 8 months ago

simplelifetime commented 8 months ago

Thanks for your excellent work! I'm trying to reproduce this method on LLaVA-v1.5 model. But I've encounted one problem:

File ~/anaconda3/envs/llava/lib/python3.10/site-packages/torch/autograd/init.py:200, in backward(tensors, grad_tensors, retain_graph, create_graph, grad_variables, inputs) 195 retain_graph = create_graph 197 # The reason we repeat same the comment below is that 198 # some Python versions print out the first line of a multi-line function 199 # calls in the traceback and some print out the last line --> 200 Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 201 tensors, gradtensors, retain_graph, create_graph, inputs, 202 allow_unreachable=True, accumulate_grad=True) # Calls into the C++ engine to run the backward pass

RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

What is the most probable reason that leads to such mistakes? I'm a little bit unfamiliar with adversarial training, hope you can provide some helps. Thanks!

Unispac commented 8 months ago

Hey,

Based on the log, it seems that you are computing gradients for those tensors that do not set the require grad flags. Could you provide more information about how you run the code and how you get into this error? Otherwise, it would be difficult to conclude the reasons.

Thanks!

dribnet commented 7 months ago

I also tried llava-1.5 and got the same error. Following suggestions online I added

model.enable_input_require_grads()

after loading the model, which resolved this issue.

However, the visual attack code still fails as the adv_noise.grad fields are still not populated on the after the call to target_loss.backward() - which seems to indicate that the gradients are not propagating back to the image inputs.

Unispac commented 7 months ago

Hi all,

Could you try this checkpoint? https://huggingface.co/liuhaotian/llava-llama-2-13b-chat-lightning-preview

I checked the llava repository, llava-1.5 was released on Oct 5, which was after the publication of our paper. So, it is likely that checkpoint is not compatible with the older version of llava codes that we curate in this repository.

Sorry for the confusion.

dribnet commented 7 months ago

Thanks Xiangyu - yes I can confirm that the liuhaotian/llava-llama-2-13b-chat-lightning-preview checkpoint you suggest works well with the codebase as-is.

I also have made progress in adapting the code to work with the latest v1.5 models which have a number of improvements such as handling larger 336x336px inputs. For example, here's a (harmless) output done with the liuhaotian/llava-v1.5-7b model.

bad_prompt

But there's some remaining issues to resolve in loading these newer models correctly - happy to share notes if anyone else is working on this.

rain305f commented 6 months ago

Reference in

Thanks Xiangyu - yes I can confirm that the liuhaotian/llava-llama-2-13b-chat-lightning-preview checkpoint you suggest works well with the codebase as-is.

I also have made progress in adapting the code to work with the latest v1.5 models which have a number of improvements such as handling larger 336x336px inputs. For example, here's a (harmless) output done with the liuhaotian/llava-v1.5-7b model.

bad_prompt

But there's some remaining issues to resolve in loading these newer models correctly - happy to share notes if anyone else is working on this.

How do you address the promble that the adv_noise.grad is None with the liuhaotian/llava-v1.5-7b model? Thanks a lot!

YitingQu commented 4 months ago

The reason of None adv_noise.grad is that, LLaVA-1.5 by default uses @ torch.no_grad() when using the CLIP vision encoder, commenting off this line (llava/models/multimodal_encoder/clip_encoder/line39) should work.

RylanSchaeffer commented 4 months ago

@YitingQu thank you!

RylanSchaeffer commented 4 months ago

@YitingQu Does one need to re-install Llava with pip after commenting out that line?

Edit: Answer: no.