Heidelberg-NLP / MM-SHAP

This is the official implementation of the paper "MM-SHAP: A Performance-agnostic Metric for Measuring Multimodal Contributions in Vision and Language Models & Tasks"
https://aclanthology.org/2023.acl-long.223/
MIT License
20 stars 4 forks source link

Models on the VQA task #4

Closed akskuchi closed 1 year ago

akskuchi commented 1 year ago

Hello,

Thank you for your work!

I am trying to understand how the reported Shapley values were estimated for the VQA/GQA tasks. Here are some specific questions:

  1. Are the question and answer of each instance concatenated together for textual input to the model (LXMERT/ALBEF-VQA)?
  2. What model output is being distributed among the tokens? final argmax probability?
LetiP commented 1 year ago

Hi, thanks for the question!

QA are not concatenated. Shapley Values explain a certain answer, meaning that they represent how the input tokens contributed towards that answer. Therefore like in the ISA case, we let the model predict an answer and to compute Shapley Values, we look at how the probability for that answer changes when we mask the inputs in many combinations (eq. 1 in the paper).

I hope this clears it up, if not, I am happy to answer further questions.

akskuchi commented 1 year ago

Yes, that explains. Thank you for the quick response 👍🏽