IntelLabs / lvlm-interpret

Apache License 2.0
32 stars 6 forks source link

Runtime Error #7

Closed PhysicianHOYA closed 2 weeks ago

PhysicianHOYA commented 4 weeks ago

After uploading the picture, the following problems occurred. Please help me solve the problem, thank you very much. c7502f4c38adaf81058439ad3004da84

xyliu-cs commented 3 weeks ago

@PhysicianHOYA I have encountered the same issue. After some debugging, I think the problem might lie in the modelling_llava.py , where self.vision_tower(...) is called without the output_attentions argument, which makes it always not return the attentions from the clip vision model.

Unfortunately, I do not know any elegant solution for now. Nevertheless, for a temporal fix, you can try to modify the above code to manually add the output_attentions argument and the vit relevancy map will be shown in this app.

PhysicianHOYA commented 2 weeks ago

@xyliu-cs Thanks for your help. But I still can't get this code to work. If you have solved this problem, could you please provide a working code example?

xyliu-cs commented 2 weeks ago

Hi @PhysicianHOYA , I think you can try to modify the line of code I mentioned earlier into: image_outputs = self.vision_tower(pixel_values, output_hidden_states=True, output_attentions=True), reinstalled the transformers library and run the lvlm-interpret.

PhysicianHOYA commented 2 weeks ago

@xyliu-cs Thank you very much for your help, the code can run normally.