Closed makemecker closed 3 months ago
Hi, thanks for taking an interest in lvlm-interpret.
I'm not familiar with PaliGemma, but it has a slightly different implementation than Llava-based models. Most significant change you will have to make is change from LlavaForConditionalGeneration
to PaliGemmaForConditionalGeneration
in utils_model.py
.
You will also have to check the number of image tokens to apply the relevancy map on (which is currently hard-coded to llava)
Hi, thanks for taking an interest in lvlm-interpret. I'm not familiar with PaliGemma, but it has a slightly different implementation than Llava-based models. Most significant change you will have to make is change from
LlavaForConditionalGeneration
toPaliGemmaForConditionalGeneration
inutils_model.py
.You will also have to check the number of image tokens to apply the relevancy map on (which is currently hard-coded to llava)
Hi there,
Thank you for your prompt response and guidance!
Just to clarify, does the tool LVLM Interpret work exclusively with models based on llava?
I tried making the suggested change in utils_model.py
, replacing LlavaForConditionalGeneration
with PaliGemmaForConditionalGeneration
, but unfortunately, this did not resolve the issue. The error persists when running the command with the PaliGemma model.
Yes, the LVLM-Interpret tool has been designed to work with llava v1.5 models in particular. This has to do with finding the indices of the image tokens which are concatenated into the LLM input.
You will have to find the correct image indices in the PaliGemma inputs for the tool to work.
Hi there!
First off, I want to thank you for developing such an amazing tool! LVLM Interpret is incredibly useful for understanding and interpreting large vision-language models.
However, I encountered an issue while trying to use the tool with the PaliGemma model. When running the following command:
python app.py --model_name_or_path google/paligemma-3b-mix-448 --share
and uploading an image with the question "Do you see a cat in this image?", I received the following error:
This issue does not occur when I run the Intel/llava-gemma-2b model using the command:
python app.py --model_name_or_path Intel/llava-gemma-2b --share
The model successfully responds to the same question ("Do you see a cat in this image?") with the same image.
Thanks again for your hard work!