Open simplelifetime opened 8 months ago
Hey,
Based on the log, it seems that you are computing gradients for those tensors that do not set the require grad flags. Could you provide more information about how you run the code and how you get into this error? Otherwise, it would be difficult to conclude the reasons.
Thanks!
I also tried llava-1.5 and got the same error. Following suggestions online I added
model.enable_input_require_grads()
after loading the model, which resolved this issue.
However, the visual attack code still fails as the adv_noise.grad
fields are still not populated on the after the call to target_loss.backward()
- which seems to indicate that the gradients are not propagating back to the image inputs.
Hi all,
I checked the llava repository, llava-1.5 was released on Oct 5, which was after the publication of our paper. So, it is likely that checkpoint is not compatible with the older version of llava codes that we curate in this repository.
Sorry for the confusion.
Thanks Xiangyu - yes I can confirm that the liuhaotian/llava-llama-2-13b-chat-lightning-preview
checkpoint you suggest works well with the codebase as-is.
I also have made progress in adapting the code to work with the latest v1.5 models which have a number of improvements such as handling larger 336x336px inputs. For example, here's a (harmless) output done with the liuhaotian/llava-v1.5-7b
model.
But there's some remaining issues to resolve in loading these newer models correctly - happy to share notes if anyone else is working on this.
Reference in
Thanks Xiangyu - yes I can confirm that the
liuhaotian/llava-llama-2-13b-chat-lightning-preview
checkpoint you suggest works well with the codebase as-is.I also have made progress in adapting the code to work with the latest v1.5 models which have a number of improvements such as handling larger 336x336px inputs. For example, here's a (harmless) output done with the
liuhaotian/llava-v1.5-7b
model.
But there's some remaining issues to resolve in loading these newer models correctly - happy to share notes if anyone else is working on this.
How do you address the promble that the adv_noise.grad is None with the liuhaotian/llava-v1.5-7b model? Thanks a lot!
The reason of None adv_noise.grad is that, LLaVA-1.5 by default uses @ torch.no_grad() when using the CLIP vision encoder, commenting off this line (llava/models/multimodal_encoder/clip_encoder/line39) should work.
@YitingQu thank you!
@YitingQu Does one need to re-install Llava with pip after commenting out that line?
Edit: Answer: no.
Thanks for your excellent work! I'm trying to reproduce this method on LLaVA-v1.5 model. But I've encounted one problem:
File ~/anaconda3/envs/llava/lib/python3.10/site-packages/torch/autograd/init.py:200, in backward(tensors, grad_tensors, retain_graph, create_graph, grad_variables, inputs) 195 retain_graph = create_graph 197 # The reason we repeat same the comment below is that 198 # some Python versions print out the first line of a multi-line function 199 # calls in the traceback and some print out the last line --> 200 Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 201 tensors, gradtensors, retain_graph, create_graph, inputs, 202 allow_unreachable=True, accumulate_grad=True) # Calls into the C++ engine to run the backward pass
RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn
What is the most probable reason that leads to such mistakes? I'm a little bit unfamiliar with adversarial training, hope you can provide some helps. Thanks!