IntelLabs / lvlm-interpret

Apache License 2.0
44 stars 6 forks source link

AttributeError: 'State' object has no attribute 'attention_key' #3

Closed makemecker closed 3 months ago

makemecker commented 4 months ago

Hi there!

First off, I want to thank you for developing such an amazing tool! LVLM Interpret is incredibly useful for understanding and interpreting large vision-language models.

However, I encountered an issue while trying to use the tool with the PaliGemma model. When running the following command:

python app.py --model_name_or_path google/paligemma-3b-mix-448 --share

and uploading an image with the question "Do you see a cat in this image?", I received the following error:

INFO:utils_gradio:Do you see a cat in this image? Traceback (most recent call last): File "/home/tyugunov/venvs/lvlm_interpret/lib/python3.10/site-packages/gradio/queueing.py", line 532, in process_events response = await route_utils.call_process_api( File "/home/tyugunov/venvs/lvlm_interpret/lib/python3.10/site-packages/gradio/route_utils.py", line 276, in call_process_api output = await app.get_blocks().process_api( File "/home/tyugunov/venvs/lvlm_interpret/lib/python3.10/site-packages/gradio/blocks.py", line 1928, in process_api result = await self.call_function( File "/home/tyugunov/venvs/lvlm_interpret/lib/python3.10/site-packages/gradio/blocks.py", line 1514, in call_function prediction = await anyio.to_thread.run_sync( File "/home/tyugunov/venvs/lvlm_interpret/lib/python3.10/site-packages/anyio/to_thread.py", line 56, in run_sync return await get_async_backend().run_sync_in_worker_thread( File "/home/tyugunov/venvs/lvlm_interpret/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 2177, in run_sync_in_worker_thread return await future File "/home/tyugunov/venvs/lvlm_interpret/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 859, in run result = context.run(func, args) File "/home/tyugunov/venvs/lvlm_interpret/lib/python3.10/site-packages/gradio/utils.py", line 832, in wrapper response = f(args, kwargs) File "/home/tyugunov/modules/lvlm-interpret/utils_gradio.py", line 125, in lvlm_bot img_idx = torch.where(input_ids==model.config.image_token_index)[1].item() RuntimeError: a Tensor with 1025 elements cannot be converted to Scalar Traceback (most recent call last): File "/home/tyugunov/venvs/lvlm_interpret/lib/python3.10/site-packages/gradio/queueing.py", line 532, in process_events response = await route_utils.call_process_api( File "/home/tyugunov/venvs/lvlm_interpret/lib/python3.10/site-packages/gradio/route_utils.py", line 276, in call_process_api output = await app.get_blocks().process_api( File "/home/tyugunov/venvs/lvlm_interpret/lib/python3.10/site-packages/gradio/blocks.py", line 1928, in process_api result = await self.call_function( File "/home/tyugunov/venvs/lvlm_interpret/lib/python3.10/site-packages/gradio/blocks.py", line 1514, in call_function prediction = await anyio.to_thread.run_sync( File "/home/tyugunov/venvs/lvlm_interpret/lib/python3.10/site-packages/anyio/to_thread.py", line 56, in run_sync return await get_async_backend().run_sync_in_worker_thread( File "/home/tyugunov/venvs/lvlm_interpret/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 2177, in run_sync_in_worker_thread return await future File "/home/tyugunov/venvs/lvlm_interpret/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 859, in run result = context.run(func, args) File "/home/tyugunov/venvs/lvlm_interpret/lib/python3.10/site-packages/gradio/utils.py", line 832, in wrapper response = f(args, kwargs) File "/home/tyugunov/modules/lvlm-interpret/utils_attn.py", line 68, in attn_update_slider fn_attention = state.attention_key + '_attn.pt' ⚠️ AttributeError: 'State' object has no attribute 'attention_key' ⚠️ Traceback (most recent call last): File "/home/tyugunov/venvs/lvlm_interpret/lib/python3.10/site-packages/gradio/queueing.py", line 532, in process_events response = await route_utils.call_process_api( File "/home/tyugunov/venvs/lvlm_interpret/lib/python3.10/site-packages/gradio/route_utils.py", line 276, in call_process_api output = await app.get_blocks().process_api( File "/home/tyugunov/venvs/lvlm_interpret/lib/python3.10/site-packages/gradio/blocks.py", line 1928, in process_api result = await self.call_function( File "/home/tyugunov/venvs/lvlm_interpret/lib/python3.10/site-packages/gradio/blocks.py", line 1514, in call_function prediction = await anyio.to_thread.run_sync( File "/home/tyugunov/venvs/lvlm_interpret/lib/python3.10/site-packages/anyio/to_thread.py", line 56, in run_sync return await get_async_backend().run_sync_in_worker_thread( File "/home/tyugunov/venvs/lvlm_interpret/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 2177, in run_sync_in_worker_thread return await future File "/home/tyugunov/venvs/lvlm_interpret/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 859, in run result = context.run(func, args) File "/home/tyugunov/venvs/lvlm_interpret/lib/python3.10/site-packages/gradio/utils.py", line 832, in wrapper response = f(args, kwargs) File "/home/tyugunov/modules/lvlm-interpret/utils_causal_discovery.py", line 46, in causality_update_dropdown generated_text = state.output_ids_decoded ⚠️ AttributeError: 'State' object has no attribute 'output_ids_decoded'**⚠️

This issue does not occur when I run the Intel/llava-gemma-2b model using the command:

python app.py --model_name_or_path Intel/llava-gemma-2b --share

The model successfully responds to the same question ("Do you see a cat in this image?") with the same image.

Thanks again for your hard work!

shaoyent-IL commented 3 months ago

Hi, thanks for taking an interest in lvlm-interpret. I'm not familiar with PaliGemma, but it has a slightly different implementation than Llava-based models. Most significant change you will have to make is change from LlavaForConditionalGeneration to PaliGemmaForConditionalGeneration in utils_model.py.

You will also have to check the number of image tokens to apply the relevancy map on (which is currently hard-coded to llava)

makemecker commented 3 months ago

Hi, thanks for taking an interest in lvlm-interpret. I'm not familiar with PaliGemma, but it has a slightly different implementation than Llava-based models. Most significant change you will have to make is change from LlavaForConditionalGeneration to PaliGemmaForConditionalGeneration in utils_model.py.

You will also have to check the number of image tokens to apply the relevancy map on (which is currently hard-coded to llava)

Hi there,

Thank you for your prompt response and guidance!

Just to clarify, does the tool LVLM Interpret work exclusively with models based on llava?

I tried making the suggested change in utils_model.py, replacing LlavaForConditionalGeneration with PaliGemmaForConditionalGeneration, but unfortunately, this did not resolve the issue. The error persists when running the command with the PaliGemma model.

shaoyent-IL commented 3 months ago

Yes, the LVLM-Interpret tool has been designed to work with llava v1.5 models in particular. This has to do with finding the indices of the image tokens which are concatenated into the LLM input.

You will have to find the correct image indices in the PaliGemma inputs for the tool to work.