Heidelberg-NLP / MM-SHAP

This is the official implementation of the paper "MM-SHAP: A Performance-agnostic Metric for Measuring Multimodal Contributions in Vision and Language Models & Tasks"
https://aclanthology.org/2023.acl-long.223/
MIT License
17 stars 4 forks source link

RuntimeError: cannot register a hook on a tensor that doesn't require gradient #6

Closed ChengYuChuan closed 3 months ago

ChengYuChuan commented 3 months ago

Hello @LetiP,

It's me again :P Thank you for your patience and time.

The spec of my usage GPU: 4x Nvidia GTX 1080 Ti (Pascal, 11GB memory), in 24 cores/48 threads/256 GB memory server

Here is my setting in the beginning of the mm-shap_albef_dataset.py

num_samples = "all"  # "all" or number
if num_samples != "all":
    num_samples = int(num_samples)
checkp = "mscoco"  # refcoco, mscoco, vqa, flickr30k
write_res = "yes"  # "yes" or "no"
task = "image_sentence_alignment"  # image_sentence_alignment, vqa, gqa
other_tasks_than_valse = ['mscoco', 'vqa', 'gqa', 'gqa_balanced', 'nlvr2']
use_cuda = True

DATA = {
    "existence": ["/home/students/cheng/MM-SHAP/visual7w/images",
                  '/home/students/cheng/MM-SHAP/data/existence.json'],
      }

I google for some solutions for this issue, and usually it's related to:

However, these two issues sound not like the case I have here. Do you encounter any similar problem?

Here is the OOM:

Argument interpolation should be of type InterpolationMode instead of int. Please, use InterpolationMode enum.

  0%|          | 0/534 [00:00<?, ?it/s]
  0%|          | 0/534 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "mm-shap_albef_dataset.py", line 306, in <module>
    shap_values = explainer(X)
  File "/home/students/cheng/MM-SHAP/shap/explainers/_permutation.py", line 62, in __call__
    batch_size=batch_size, outputs=outputs, silent=silent
  File "/home/students/cheng/MM-SHAP/shap/explainers/_permutation.py", line 76, in __call__
    outputs=outputs, silent=silent
  File "/home/students/cheng/MM-SHAP/shap/explainers/_explainer.py", line 260, in __call__
    batch_size=batch_size, outputs=outputs, silent=silent, **kwargs
  File "/home/students/cheng/MM-SHAP/shap/explainers/_permutation.py", line 134, in explain_row
    outputs = fm(masks, zero_index=0, batch_size=batch_size)
  File "/home/students/cheng/MM-SHAP/shap/utils/_masked_model.py", line 65, in __call__
    return self._full_masking_call(full_masks, zero_index=zero_index, batch_size=batch_size)
  File "/home/students/cheng/MM-SHAP/shap/utils/_masked_model.py", line 141, in _full_masking_call
    outputs = self.model(*joined_masked_inputs)
  File "/home/students/cheng/MM-SHAP/shap/models/_model.py", line 21, in __call__
    return np.array(self.inner_model(*args))
  File "mm-shap_albef_dataset.py", line 184, in get_model_prediction
    masked_text_inputs.to("cuda"))
  File "/home/students/cheng/anaconda3/envs/shap/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "mm-shap_albef_dataset.py", line 92, in forward
    return_dict=True,
  File "/home/students/cheng/anaconda3/envs/shap/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/students/cheng/MM-SHAP/ALBEF/models/xbert.py", line 1067, in forward
    mode=mode,
  File "/home/students/cheng/anaconda3/envs/shap/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/students/cheng/MM-SHAP/ALBEF/models/xbert.py", line 601, in forward
    output_attentions,
  File "/home/students/cheng/anaconda3/envs/shap/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/students/cheng/MM-SHAP/ALBEF/models/xbert.py", line 504, in forward
    output_attentions=output_attentions,
  File "/home/students/cheng/anaconda3/envs/shap/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/students/cheng/MM-SHAP/ALBEF/models/xbert.py", line 407, in forward
    output_attentions,
  File "/home/students/cheng/anaconda3/envs/shap/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/students/cheng/MM-SHAP/ALBEF/models/xbert.py", line 329, in forward
    attention_probs.register_hook(self.save_attn_gradients)         
  File "/home/students/cheng/anaconda3/envs/shap/lib/python3.6/site-packages/torch/_tensor.py", line 289, in register_hook
    raise RuntimeError("cannot register a hook on a tensor that "
RuntimeError: cannot register a hook on a tensor that doesn't require gradient
srun: error: gpu08: task 0: Exited with exit code 1
LetiP commented 3 months ago

Hi @ChengYuChuan , I am sorry you are running into hardware problems again! I did not encounter this issue, but by looking at your hardware specs (GTX 1080 Ti) and the date of the ALBEF model publication, I am wondering whether you have the latest NVIDIA drivers. What driver version does it say when you run nvidia-smi?

I am a bit confused about the issue, because your script seems to pass line 275, which is great, meaning you can now run a model inference! 🥳

ChengYuChuan commented 3 months ago

Hello @LetiP ,

thank you for issue review.

here is the result of the command nvidia-smi

GPU-08:

+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.14              Driver Version: 550.54.14      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce GTX 1080 Ti     Off |   00000000:5E:00.0 Off |                  N/A |
| 29%   19C    P8              8W /  250W |       4MiB /  11264MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   1  NVIDIA GeForce GTX 1080 Ti     Off |   00000000:86:00.0 Off |                  N/A |
| 48%   63C    P2            201W /  250W |    7650MiB /  11264MiB |     98%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   2  NVIDIA GeForce GTX 1080 Ti     Off |   00000000:AF:00.0 Off |                  N/A |
| 29%   21C    P8              7W /  250W |       2MiB /  11264MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    1   N/A  N/A    541503      C   python                                       7646MiB |
+-----------------------------------------------------------------------------------------+

GPU-09:

+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.14              Driver Version: 550.54.14      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce GTX 1080 Ti     Off |   00000000:3B:00.0 Off |                  N/A |
| 25%   28C    P8             11W /  250W |       2MiB /  11264MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   1  NVIDIA GeForce GTX 1080 Ti     Off |   00000000:5E:00.0 Off |                  N/A |
| 25%   22C    P8             11W /  250W |       2MiB /  11264MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   2  NVIDIA GeForce GTX 1080 Ti     Off |   00000000:86:00.0 Off |                  N/A |
| 25%   21C    P8             12W /  250W |       2MiB /  11264MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   3  NVIDIA GeForce GTX 1080 Ti     Off |   00000000:AF:00.0 Off |                  N/A |
| 25%   21C    P8             11W /  250W |       2MiB /  11264MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+
LetiP commented 3 months ago

Hi, this looks good. Then the next thing is to ensure that the installed pytorch version matches the cuda version. https://pytorch.org/get-started/locally/ conda install pytorch torchvision torchaudio pytorch-cuda=12.1 -c pytorch -c nvidia

ChengYuChuan commented 3 months ago

hello @LetiP

In the beginning, I installed the environment exactly with environment.yml under the command conda env create -f environment.yml. I check my own version of these both from the environment.yml

Now, I have higher version than the environment. torchaudio 0.10.2 py36_cu111 pytorch torchvision 0.11.3 py36_cu111 pytorch

my conda list result is down below:

(shap) cheng@login:~/MM-SHAP$ conda list
# packages in environment at /home/students/cheng/anaconda3/envs/shap:
#
# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                        main    anaconda
_openmp_mutex             4.5                       1_gnu    anaconda
_py-xgboost-mutex         2.0                       cpu_0    anaconda
abseil-cpp                20210324.2           h9c3ff4c_0    conda-forge
aiohttp                   3.7.4.post0      py36h8f6f2f9_0    conda-forge
argon2-cffi               20.1.0           py36h27cfd23_1    anaconda
arrow-cpp                 3.0.0            py36h6b21186_4    anaconda
async-timeout             3.0.1                   py_1000    conda-forge
async_generator           1.10             py36h28b3542_0    anaconda
attrs                     21.2.0             pyhd8ed1ab_0    conda-forge
autopep8                  1.5.7              pyhd3eb1b0_0    anaconda
aws-c-common              0.4.57               he6710b0_1    anaconda
aws-c-event-stream        0.1.6                h2531618_5    anaconda
aws-checksums             0.1.9                he6710b0_0    anaconda
aws-sdk-cpp               1.8.185              hce553d0_0    anaconda
backports                 1.0                        py_2    anaconda
backports.functools_lru_cache 1.6.4              pyhd8ed1ab_0    conda-forge
blas                      1.0                         mkl    anaconda
bleach                    4.0.0              pyhd3eb1b0_0    anaconda
boost-cpp                 1.69.0            h11c811c_1000    conda-forge
brotli                    1.0.9                h7f98852_5    conda-forge
brotli-bin                1.0.9                h7f98852_5    conda-forge
brotlipy                  0.7.0           py36h27cfd23_1003    anaconda
bzip2                     1.0.8                h7b6447c_0    anaconda
c-ares                    1.17.1               h27cfd23_0    anaconda
ca-certificates           2020.10.14                    0    anaconda
certifi                   2020.6.20                py36_0    anaconda
cffi                      1.14.6           py36h400218f_0    anaconda
chardet                   4.0.0            py36h5fab9bb_1    conda-forge
charset-normalizer        2.0.4              pyhd3eb1b0_0    anaconda
click                     7.1.2              pyh9f0ad1d_0    conda-forge
cloudpickle               2.0.0              pyhd3eb1b0_0    anaconda
configparser              5.2.0              pyhd8ed1ab_0    conda-forge
cryptography              3.4.7            py36hd23ed53_0    anaconda
cuda-cudart               12.1.105                      0    nvidia
cuda-cupti                12.1.105                      0    nvidia
cuda-libraries            12.1.0                        0    nvidia
cuda-nvrtc                12.1.105                      0    nvidia
cuda-nvtx                 12.1.105                      0    nvidia
cuda-opencl               12.4.99                       0    nvidia
cuda-runtime              12.1.0                        0    nvidia
cudatoolkit               11.1.74              h6bb024c_0    nvidia
cycler                    0.10.0                   py36_0    anaconda
cytoolz                   0.11.0           py36h7b6447c_0    anaconda
dask-core                 2021.3.0           pyhd3eb1b0_0    anaconda
dataclasses               0.8                pyh4f3eec9_6    anaconda
datasets                  1.12.1             pyhd8ed1ab_1    conda-forge
dbus                      1.13.18              hb2f20db_0    anaconda
decorator                 5.1.0              pyhd8ed1ab_0    conda-forge
defusedxml                0.7.1              pyhd3eb1b0_0    anaconda
dill                      0.3.4              pyhd8ed1ab_0    conda-forge
docker-pycreds            0.4.0                      py_0    anaconda
double-conversion         3.1.5                h9c3ff4c_2    conda-forge
entrypoints               0.3             pyhd8ed1ab_1003    conda-forge
expat                     2.4.1                h2531618_2    anaconda
ffmpeg                    4.2.2                h20bf706_0    anaconda
filelock                  3.0.12             pyhd3eb1b0_1    anaconda
fontconfig                2.13.1               h6c09931_0    anaconda
freetype                  2.10.4               h5ab3b9f_0    anaconda
fsspec                    2021.10.0          pyhd8ed1ab_0    conda-forge
gflags                    2.2.2             he1b5a44_1004    conda-forge
gitdb                     4.0.9              pyhd8ed1ab_0    conda-forge
gitpython                 3.1.11                     py_0    conda-forge
glib                      2.69.1               h5202010_0    anaconda
glog                      0.5.0                h48cff8f_0    conda-forge
gmp                       6.2.1                h2531618_2    anaconda
gnutls                    3.6.15               he1e5248_0    anaconda
grpc-cpp                  1.39.0               hae934f6_5    anaconda
gst-plugins-base          1.14.0               h8213a91_2    anaconda
gstreamer                 1.14.0               h28cd5cc_2    anaconda
hdf5                      1.10.2               hba1933b_1    anaconda
huggingface_hub           0.0.17                     py_0    huggingface
icu                       58.2                 he6710b0_3    anaconda
idna                      3.2                pyhd3eb1b0_0    anaconda
idna_ssl                  1.1.0           py36h9f0ad1d_1001    conda-forge
imagehash                 4.2.1              pyhd3eb1b0_0    anaconda
imageio                   2.9.0              pyhd3eb1b0_0    anaconda
importlib-metadata        4.8.1            py36h06a4308_0    anaconda
importlib_metadata        4.8.1                hd3eb1b0_0    anaconda
intel-openmp              2021.3.0          h06a4308_3350    anaconda
ipykernel                 5.5.5            py36hcb3619a_0    conda-forge
ipython                   5.8.0                    py36_1    conda-forge
ipython_genutils          0.2.0                      py_1    conda-forge
ipywidgets                7.6.5              pyhd3eb1b0_1    anaconda
jinja2                    3.0.1              pyhd3eb1b0_0    anaconda
joblib                    1.0.1              pyhd3eb1b0_0    anaconda
jpeg                      9b                   h024ee3a_2  
jsonschema                3.2.0              pyhd3eb1b0_2    anaconda
jupyter_client            7.0.6              pyhd8ed1ab_0    conda-forge
jupyter_core              4.8.1            py36h5fab9bb_0    conda-forge
jupyterlab_pygments       0.1.2                      py_0    anaconda
jupyterlab_widgets        1.0.0              pyhd3eb1b0_1    anaconda
kiwisolver                1.3.1            py36h2531618_0    anaconda
krb5                      1.19.2               hcc1bbae_0    conda-forge
lame                      3.100                h7b6447c_0    anaconda
lcms2                     2.12                 h3be6417_0    anaconda
ld_impl_linux-64          2.35.1               h7274673_9    anaconda
libboost                  1.73.0              h3ff78a5_11  
libbrotlicommon           1.0.9                h7f98852_5    conda-forge
libbrotlidec              1.0.9                h7f98852_5    conda-forge
libbrotlienc              1.0.9                h7f98852_5    conda-forge
libcublas                 12.1.0.26                     0    nvidia
libcufft                  11.0.2.4                      0    nvidia
libcufile                 1.9.0.20                      0    nvidia
libcurand                 10.3.5.119                    0    nvidia
libcurl                   7.78.0               h0b77cf5_0    anaconda
libcusolver               11.4.4.55                     0    nvidia
libcusparse               12.0.2.55                     0    nvidia
libedit                   3.1.20191231         he28a2e2_2    conda-forge
libev                     4.33                 h516909a_1    conda-forge
libevent                  2.1.10               hcdb4288_3    conda-forge
libffi                    3.3                  he6710b0_2    anaconda
libgcc-ng                 9.3.0               h5101ec6_17    anaconda
libgfortran-ng            7.5.0               ha8ba4b0_17    anaconda
libgfortran4              7.5.0               ha8ba4b0_17    anaconda
libgomp                   9.3.0               h5101ec6_17    anaconda
libidn2                   2.3.2                h7f8727e_0    anaconda
libllvm10                 10.0.1               hbcb73fb_5    anaconda
libnghttp2                1.43.0               h812cca2_0    conda-forge
libnpp                    12.0.2.50                     0    nvidia
libnvjitlink              12.1.105                      0    nvidia
libnvjpeg                 12.1.1.14                     0    nvidia
libopus                   1.3.1                h7b6447c_0    anaconda
libpng                    1.6.37               hbc83047_0    anaconda
libprotobuf               3.17.2               h4ff587b_1    anaconda
libsodium                 1.0.18               h36c2ea0_1    conda-forge
libssh2                   1.9.0                h1ba5d50_1    anaconda
libstdcxx-ng              9.3.0               hd4cf53a_17    anaconda
libtasn1                  4.16.0               h27cfd23_0    anaconda
libthrift                 0.14.2               he6d91bd_1    conda-forge
libtiff                   4.2.0                h85742a9_0  
libunistring              0.9.10               h27cfd23_0    anaconda
libuuid                   1.0.3                h1bed415_2    anaconda
libuv                     1.40.0               h7b6447c_0    anaconda
libvpx                    1.7.0                h439df22_0    anaconda
libwebp-base              1.2.0                h27cfd23_0    anaconda
libxcb                    1.14                 h7b6447c_0    anaconda
libxgboost                1.3.3                h2531618_0    anaconda
libxml2                   2.9.12               h03d6c58_0    anaconda
llvmlite                  0.36.0           py36h612dafd_4    anaconda
lz4-c                     1.9.3                h295c915_1    anaconda
markupsafe                2.0.1            py36h27cfd23_0    anaconda
matplotlib                3.3.4            py36h06a4308_0    anaconda
matplotlib-base           3.3.4            py36h62a2d02_0    anaconda
mistune                   0.8.4            py36h7b6447c_0    anaconda
mkl                       2020.2                      256    anaconda
mkl-service               2.3.0            py36he8ac12f_0  
mkl_fft                   1.3.0            py36h54f3939_0  
mkl_random                1.1.1            py36h0573a6f_0    anaconda
multidict                 5.1.0            py36h27cfd23_2    anaconda
multiprocess              0.70.12.2        py36h8f6f2f9_0    conda-forge
nbclient                  0.5.3              pyhd3eb1b0_0    anaconda
nbconvert                 6.0.7                    py36_0    anaconda
nbformat                  5.1.3              pyhd3eb1b0_0    anaconda
ncurses                   6.2                  he6710b0_1    anaconda
nest-asyncio              1.5.1              pyhd8ed1ab_0    conda-forge
nettle                    3.7.3                hbbd107a_1    anaconda
networkx                  2.5                        py_0    anaconda
ninja                     1.10.2               hff7bd54_1    anaconda
notebook                  6.3.0            py36h06a4308_0    anaconda
numba                     0.53.1           py36ha9443f7_0    anaconda
numpy                     1.19.2           py36h54aff64_0  
numpy-base                1.19.2           py36hfa32c7d_0  
olefile                   0.46                     py36_0    anaconda
opencv                    3.4.1            py36h6fd60c2_1    anaconda
opencv-python             4.5.3.56                 pypi_0    pypi
openh264                  2.1.0                hd408876_0    anaconda
openjpeg                  2.4.0                h3ad879b_0    anaconda
openssl                   1.1.1n               h7f8727e_0    anaconda
orc                       1.6.9                ha97a36c_3    anaconda
packaging                 21.0               pyhd3eb1b0_0    anaconda
pandas                    1.1.5            py36ha9443f7_0    anaconda
pandoc                    2.12                 h06a4308_0    anaconda
pandocfilters             1.4.3            py36h06a4308_1    anaconda
pathtools                 0.1.2                      py_1    anaconda
pcre                      8.45                 h295c915_0    anaconda
pexpect                   4.8.0              pyh9f0ad1d_2    conda-forge
pickleshare               0.7.5                   py_1003    conda-forge
pillow                    8.3.1            py36h2c7a002_0    anaconda
pip                       21.2.2           py36h06a4308_0    anaconda
prometheus_client         0.11.0             pyhd3eb1b0_0    anaconda
promise                   2.3              py36h5fab9bb_4    conda-forge
prompt_toolkit            1.0.15                     py_1    conda-forge
protobuf                  3.17.2           py36h295c915_0    anaconda
psutil                    5.8.0            py36h27cfd23_1    anaconda
ptyprocess                0.7.0              pyhd3deb0d_0    conda-forge
py-xgboost                1.3.3            py36h06a4308_0    anaconda
pyarrow                   3.0.0            py36he0739d4_3    anaconda
pycodestyle               2.7.0              pyhd3eb1b0_0    anaconda
pycparser                 2.20                       py_2    anaconda
pygments                  2.10.0             pyhd8ed1ab_0    conda-forge
pyopenssl                 20.0.1             pyhd3eb1b0_1    anaconda
pyparsing                 2.4.7              pyhd3eb1b0_0    anaconda
pyqt                      5.9.2            py36h05f1152_2    anaconda
pyrsistent                0.17.3           py36h7b6447c_0    anaconda
pysocks                   1.7.1            py36h06a4308_0    anaconda
python                    3.6.13               h12debd9_1    anaconda
python-dateutil           2.8.2              pyhd3eb1b0_0    anaconda
python-wget               3.2                        py_0    conda-forge
python-xxhash             2.0.2            py36h8f6f2f9_0    conda-forge
python_abi                3.6                     1_cp36m    huggingface
pytorch                   1.10.2          py3.6_cuda11.1_cudnn8.0.5_0    pytorch
pytorch-cuda              12.1                 ha16c6d3_5    pytorch
pytorch-mutex             1.0                        cuda    pytorch
pytz                      2021.1             pyhd3eb1b0_0    anaconda
pywavelets                1.1.1            py36h7b6447c_2    anaconda
pyyaml                    5.4.1            py36h27cfd23_1    anaconda
pyzmq                     19.0.2           py36h9947dbf_2    conda-forge
qt                        5.9.7                h5867ecd_1  
re2                       2021.08.01           h9c3ff4c_0    conda-forge
readline                  8.1                  h27cfd23_0    anaconda
regex                     2021.8.3         py36h7f8727e_0    anaconda
requests                  2.26.0             pyhd3eb1b0_0    anaconda
ruamel_yaml               0.15.87          py36h7b6447c_1    anaconda
sacremoses                master                     py_0    huggingface
scikit-image              0.17.2           py36hdf5156a_0    anaconda
scikit-learn              0.24.2           py36ha9443f7_0    anaconda
scipy                     1.5.2            py36h0b6359f_0  
send2trash                1.8.0              pyhd3eb1b0_1    anaconda
sentry-sdk                1.5.4              pyhd8ed1ab_0    conda-forge
setuptools                58.0.4           py36h06a4308_0    anaconda
shortuuid                 1.0.1                      py_0    conda-forge
simplegeneric             0.8.1                      py_1    conda-forge
sip                       4.19.8           py36hf484d3e_0    anaconda
six                       1.16.0             pyhd3eb1b0_0    anaconda
slicer                    0.0.7              pyhd8ed1ab_0    conda-forge
smmap                     3.0.5              pyh44b312d_0    conda-forge
snappy                    1.1.8                he1b5a44_3    conda-forge
sqlite                    3.36.0               hc218d9a_0    anaconda
subprocess32              3.5.4                      py_1    anaconda
tbb                       2020.3               hfd86e86_0    anaconda
termcolor                 1.1.0                      py_2    conda-forge
terminado                 0.9.4            py36h06a4308_0    anaconda
testpath                  0.5.0              pyhd3eb1b0_0    anaconda
threadpoolctl             2.2.0              pyh0d69192_0    anaconda
tifffile                  2020.10.1        py36hdd07704_2    anaconda
timm                      0.5.4                    pypi_0    pypi
tk                        8.6.11               h1ccaba5_0    anaconda
tokenizers                0.10.3                   py36_0    huggingface
toml                      0.10.2             pyhd3eb1b0_0    anaconda
toolz                     0.11.2             pyhd3eb1b0_0    anaconda
torchaudio                0.10.2               py36_cu111    pytorch
torchvision               0.11.3               py36_cu111    pytorch
tornado                   6.1              py36h27cfd23_0    anaconda
tqdm                      4.62.2             pyhd3eb1b0_1    anaconda
traitlets                 4.3.3              pyhd8ed1ab_2    conda-forge
transformers              4.11.1                     py_0    huggingface
typing-extensions         3.10.0.2             hd3eb1b0_0    anaconda
typing_extensions         3.10.0.2           pyh06a4308_0    anaconda
uriparser                 0.9.3                he1b5a44_1    conda-forge
urllib3                   1.26.6             pyhd3eb1b0_1    anaconda
utf8proc                  2.6.1                h27cfd23_0    anaconda
wandb                     0.12.10            pyhd8ed1ab_0    conda-forge
wcwidth                   0.2.5              pyh9f0ad1d_2    conda-forge
webencodings              0.5.1                    py36_1    anaconda
wheel                     0.37.0             pyhd3eb1b0_1    anaconda
widgetsnbextension        3.5.1                    py36_0    anaconda
x264                      1!157.20191217       h7b6447c_0    anaconda
xgboost                   1.3.3            py36h06a4308_0    anaconda
xxhash                    0.8.0                h7f98852_3    conda-forge
xz                        5.2.5                h7b6447c_0    anaconda
yaml                      0.2.5                h7b6447c_0    anaconda
yarl                      1.6.3            py36h8f6f2f9_2    conda-forge
yaspin                    2.1.0              pyhd8ed1ab_0    conda-forge
zeromq                    4.3.4                h9c3ff4c_0    conda-forge
zipp                      3.5.0              pyhd3eb1b0_0    anaconda
zlib                      1.2.11               h7b6447c_3    anaconda
zstd                      1.4.9                haebb681_0    anaconda
LetiP commented 3 months ago

It looks like your installation is with cuda 11 and not 12 (it says py36_cu111). This might be the issue. When I was conducting the project, I was using cuda 11 because cuda 12 did not exist back then. Now your cards run with cuda 12, but your pytorch installation uses cuda 11. Try to move away from my cuda and pytorch version I used back then and install pytorch with cuda 12 and see if it helps. https://pytorch.org/get-started/locally/ conda install pytorch torchvision torchaudio pytorch-cuda=12.1 -c pytorch -c nvidia

ChengYuChuan commented 3 months ago

hmmm, I tried on python 3.8 torch 2.2 torchvision 0.17 environment. but it still shows the same problem...

I would like to try mm-shap_lxmert_dataset.py now and check if it happens again.

The OOM:

gpu08
RuntimeError: module compiled against API version 0xe but this version of numpy is 0xd

  0%|          | 0/534 [00:00<?, ?it/s]
  0%|          | 0/534 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "mm-shap_albef_dataset.py", line 304, in <module>
    shap_values = explainer(X)
  File "/home/students/cheng/MM-SHAP/shap/explainers/_permutation.py", line 60, in __call__
    return super().__call__(
  File "/home/students/cheng/MM-SHAP/shap/explainers/_permutation.py", line 74, in __call__
    return super().__call__(
  File "/home/students/cheng/MM-SHAP/shap/explainers/_explainer.py", line 258, in __call__
    row_result = self.explain_row(
  File "/home/students/cheng/MM-SHAP/shap/explainers/_permutation.py", line 134, in explain_row
    outputs = fm(masks, zero_index=0, batch_size=batch_size)
  File "/home/students/cheng/MM-SHAP/shap/utils/_masked_model.py", line 65, in __call__
    return self._full_masking_call(full_masks, zero_index=zero_index, batch_size=batch_size)
  File "/home/students/cheng/MM-SHAP/shap/utils/_masked_model.py", line 141, in _full_masking_call
    outputs = self.model(*joined_masked_inputs)
  File "/home/students/cheng/MM-SHAP/shap/models/_model.py", line 21, in __call__
    return np.array(self.inner_model(*args))
  File "mm-shap_albef_dataset.py", line 180, in get_model_prediction
    outputs = model(masked_image.cuda(),
  File "/home/students/cheng/anaconda3/envs/shap38/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/students/cheng/anaconda3/envs/shap38/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "mm-shap_albef_dataset.py", line 85, in forward
    output = self.text_encoder(text.input_ids,
  File "/home/students/cheng/anaconda3/envs/shap38/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/students/cheng/anaconda3/envs/shap38/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/students/cheng/MM-SHAP/ALBEF/models/xbert.py", line 1056, in forward
    encoder_outputs = self.encoder(
  File "/home/students/cheng/anaconda3/envs/shap38/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/students/cheng/anaconda3/envs/shap38/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/students/cheng/MM-SHAP/ALBEF/models/xbert.py", line 594, in forward
    layer_outputs = layer_module(
  File "/home/students/cheng/anaconda3/envs/shap38/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/students/cheng/anaconda3/envs/shap38/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/students/cheng/MM-SHAP/ALBEF/models/xbert.py", line 498, in forward
    cross_attention_outputs = self.crossattention(
  File "/home/students/cheng/anaconda3/envs/shap38/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/students/cheng/anaconda3/envs/shap38/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/students/cheng/MM-SHAP/ALBEF/models/xbert.py", line 400, in forward
    self_outputs = self.self(
  File "/home/students/cheng/anaconda3/envs/shap38/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/students/cheng/anaconda3/envs/shap38/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/students/cheng/MM-SHAP/ALBEF/models/xbert.py", line 329, in forward
    attention_probs.register_hook(self.save_attn_gradients)         
  File "/home/students/cheng/anaconda3/envs/shap38/lib/python3.8/site-packages/torch/_tensor.py", line 562, in register_hook
    raise RuntimeError(
RuntimeError: cannot register a hook on a tensor that doesn't require gradient
srun: error: gpu08: task 0: Exited with exit code 1

The conda list:

(shap38) cheng@login:~/MM-SHAP$ conda list
# packages in environment at /home/students/cheng/anaconda3/envs/shap38:                                                                                                                                                                                                                                           
#                                                                                                                                                                                                                                                                                                                  
# Name                    Version                   Build  Channel                                                                                                                                                                                                                                                 
_libgcc_mutex             0.1                 conda_forge    conda-forge                                                                                                                                                                                                                                           
_openmp_mutex             4.5                       2_gnu    conda-forge                                                                                                                                                                                                                                           
abseil-cpp                20211102.0           h27087fc_1    conda-forge                                                                                                                                                                                                                                           
aiohttp                   3.8.1            py38h0a891b7_1    conda-forge                                                                                                                                                                                                                                           
aiosignal                 1.3.1              pyhd8ed1ab_0    conda-forge                                                                                                                                                                                                                                           
aom                       3.6.0                h6a678d5_0  
arrow-cpp                 14.0.2               h374c478_1  
async-timeout             4.0.3              pyhd8ed1ab_0    conda-forge
attrs                     23.2.0             pyh71513ae_0    conda-forge
aws-c-auth                0.6.19               h5eee18b_0  
aws-c-cal                 0.5.20               hdbd6064_0  
aws-c-common              0.8.5                h5eee18b_0  
aws-c-compression         0.2.16               h5eee18b_0  
aws-c-event-stream        0.2.15               h6a678d5_0  
aws-c-http                0.6.25               h5eee18b_0  
aws-c-io                  0.13.10              h5eee18b_0  
aws-c-mqtt                0.7.13               h5eee18b_0  
aws-c-s3                  0.1.51               hdbd6064_0  
aws-c-sdkutils            0.1.6                h5eee18b_0  
aws-checksums             0.1.13               h5eee18b_0  
aws-crt-cpp               0.18.16              h6a678d5_0  
aws-sdk-cpp               1.10.55              h721c034_0  
blas                      1.0                         mkl  
blosc                     1.21.3               h6a678d5_0  
boost-cpp                 1.78.0               he72f1d9_0    conda-forge
bottleneck                1.3.4            py38h3ec907f_0    conda-forge
brotli                    1.0.9                h5eee18b_7  
brotli-bin                1.0.9                h5eee18b_7  
brotli-python             1.0.9            py38h6a678d5_7  
brunsli                   0.1                  h2531618_0  
bzip2                     1.0.8                h5eee18b_5  
c-ares                    1.19.1               h5eee18b_0  
ca-certificates           2024.3.11            h06a4308_0  
certifi                   2024.2.2           pyhd8ed1ab_0    conda-forge
cfitsio                   3.470                h5893167_7  
charls                    2.2.0                h2531618_0  
charset-normalizer        2.0.4              pyhd3eb1b0_0  
click                     8.1.7            py38h06a4308_0  
cloudpickle               2.2.1            py38h06a4308_0  
cuda-cudart               12.1.105                      0    nvidia
cuda-cupti                12.1.105                      0    nvidia
cuda-libraries            12.1.0                        0    nvidia
cuda-nvrtc                12.1.105                      0    nvidia
cuda-nvtx                 12.1.105                      0    nvidia
cuda-opencl               12.4.99                       0    nvidia
cuda-runtime              12.1.0                        0    nvidia
cytoolz                   0.12.2           py38h5eee18b_0  
dask-core                 2023.4.1         py38h06a4308_0  
dataclasses               0.8                pyhc8e2a94_3    conda-forge
datasets                  2.18.0             pyhd8ed1ab_0    conda-forge
dav1d                     1.2.1                h5eee18b_0  
dill                      0.3.8              pyhd8ed1ab_0    conda-forge
ffmpeg                    4.3                  hf484d3e_0    pytorch
fftw                      3.3.9                h5eee18b_2  
filelock                  3.13.1           py38h06a4308_0  
freetype                  2.12.1               h4a9f257_0  
frozenlist                1.3.0            py38h0a891b7_1    conda-forge
fsspec                    2023.10.0        py38h06a4308_0  
gflags                    2.2.2             he1b5a44_1004    conda-forge
giflib                    5.2.1                h5eee18b_3  
glog                      0.5.0                h48cff8f_0    conda-forge
gmp                       6.2.1                h295c915_3  
gmpy2                     2.1.2            py38heeb90bb_0  
gnutls                    3.6.15               he1e5248_0  
grpc-cpp                  1.48.2               he1ff14a_1    anaconda
huggingface_hub           0.21.4             pyhd8ed1ab_0    conda-forge
icu                       70.1                 h27087fc_0    conda-forge
idna                      3.4              py38h06a4308_0  
imagecodecs               2021.8.26        py38hfcb8610_2    anaconda
imageio                   2.33.1           py38h06a4308_0  
importlib-metadata        7.0.1            py38h06a4308_0  
importlib_metadata        7.0.1                hd3eb1b0_0  
intel-openmp              2021.4.0          h06a4308_3561  
jinja2                    3.1.3            py38h06a4308_0  
joblib                    1.2.0            py38h06a4308_0  
jpeg                      9e                   h5eee18b_1  
jxrlib                    1.1                  h7b6447c_2  
krb5                      1.20.1               h143b758_1  
lame                      3.100                h7b6447c_0  
lazy_loader               0.3              py38h06a4308_0  
lcms2                     2.12                 h3be6417_0  
ld_impl_linux-64          2.38                 h1181459_1  
lerc                      3.0                  h295c915_0  
libaec                    1.0.4                he6710b0_1  
libavif                   0.11.1               h5eee18b_0  
libblas                   3.9.0            12_linux64_mkl    conda-forge
libbrotlicommon           1.0.9                h5eee18b_7  
libbrotlidec              1.0.9                h5eee18b_7  
libbrotlienc              1.0.9                h5eee18b_7  
libcblas                  3.9.0            12_linux64_mkl    conda-forge
libcublas                 12.1.0.26                     0    nvidia
libcufft                  11.0.2.4                      0    nvidia
libcufile                 1.9.0.20                      0    nvidia
libcurand                 10.3.5.119                    0    nvidia
libcurl                   8.5.0                h251f7ec_0  
libcusolver               11.4.4.55                     0    nvidia
libcusparse               12.0.2.55                     0    nvidia
libdeflate                1.17                 h5eee18b_1  
libedit                   3.1.20230828         h5eee18b_0  
libev                     4.33                 h7f8727e_1  
libevent                  2.1.12               hdbd6064_1    anaconda
libffi                    3.4.4                h6a678d5_0  
libgcc-ng                 13.2.0               h807b86a_5    conda-forge
libgfortran-ng            11.2.0               h00389a5_1  
libgfortran5              11.2.0               h1234567_1  
libgomp                   13.2.0               h807b86a_5    conda-forge
libiconv                  1.16                 h7f8727e_2  
libidn2                   2.3.4                h5eee18b_0  
libjpeg-turbo             2.0.0                h9bf148f_0    pytorch
liblapack                 3.9.0            12_linux64_mkl    conda-forge
libllvm11                 11.1.0               hf817b99_3    conda-forge
libllvm14                 14.0.6               hdb19cb5_3  
libnghttp2                1.57.0               h2d74bed_0  
libnpp                    12.0.2.50                     0    nvidia
libnvjitlink              12.1.105                      0    nvidia
libnvjpeg                 12.1.1.14                     0    nvidia
libpng                    1.6.39               h5eee18b_0  
libprotobuf               3.20.3               he621ea3_0    anaconda
libssh2                   1.10.0               hdbd6064_2  
libstdcxx-ng              11.2.0               h1234567_1  
libtasn1                  4.19.0               h5eee18b_0  
libthrift                 0.15.0               h1795dd8_2    anaconda
libtiff                   4.5.1                h6a678d5_0  
libunistring              0.9.10               h27cfd23_0  
libwebp-base              1.3.2                h5eee18b_0  
libzlib                   1.2.13               hd590300_5    conda-forge
libzopfli                 1.0.3                he6710b0_0  
llvm-openmp               14.0.6               h9e868ea_0  
llvmlite                  0.38.1           py38h38d86a4_0    conda-forge
locket                    1.0.0            py38h06a4308_0  
lz4-c                     1.9.4                h6a678d5_0  
markupsafe                2.1.3            py38h5eee18b_0  
mkl                       2021.4.0           h06a4308_640  
mkl-service               2.4.0            py38h7f8727e_0  
mkl_fft                   1.3.1            py38hd3c417c_0  
mkl_random                1.2.2            py38h51133e4_0  
mpc                       1.1.0                h10f8cd9_1  
mpfr                      4.0.2                hb69a4c5_1  
mpmath                    1.3.0            py38h06a4308_0  
multidict                 6.0.2            py38h0a891b7_1    conda-forge
multiprocess              0.70.12.2        py38h0a891b7_2    conda-forge
ncurses                   6.4                  h6a678d5_0  
nettle                    3.7.3                hbbd107a_1  
networkx                  3.1              py38h06a4308_0  
numba                     0.55.1           py38h4bf6c61_0    conda-forge
numexpr                   2.8.4            py38he184ba9_0  
numpy                     1.19.2           py38hf89b668_1    conda-forge
numpy-base                1.24.3           py38h31eccc5_0  
openh264                  2.1.1                h4ff587b_0  
openjpeg                  2.4.0                h3ad879b_0  
openssl                   3.2.1                hd590300_1    conda-forge
orc                       1.7.4                hb3bc3d3_1    anaconda
packaging                 23.2             py38h06a4308_0  
pandas                    1.4.1            py38h43a58ef_0    conda-forge
partd                     1.4.1            py38h06a4308_0  
pillow                    10.2.0           py38h5eee18b_0  
pip                       23.3.1           py38h06a4308_0  
platformdirs              3.10.0           py38h06a4308_0  
pooch                     1.7.0            py38h06a4308_0  
pyarrow                   14.0.2           py38h1eedbd7_0  
pyarrow-hotfix            0.6                pyhd8ed1ab_0    conda-forge
pysocks                   1.7.1            py38h06a4308_0  
python                    3.8.19               h955ad1f_0  
python-dateutil           2.8.2              pyhd3eb1b0_0  
python-tzdata             2023.3             pyhd3eb1b0_0  
python-xxhash             1.4.4            py38h1e0a361_0    conda-forge
python_abi                3.8                      2_cp38    conda-forge
pytorch                   2.2.1           py3.8_cuda12.1_cudnn8.9.2_0    pytorch
pytorch-cuda              12.1                 ha16c6d3_5    pytorch
pytorch-mutex             1.0                        cuda    pytorch
pytz                      2023.3.post1     py38h06a4308_0  
pywavelets                1.4.1            py38h5eee18b_0  
pyyaml                    6.0.1            py38h5eee18b_0  
re2                       2022.04.01           h27087fc_0    conda-forge
readline                  8.2                  h5eee18b_0  
regex                     2022.4.24        py38h0a891b7_0    conda-forge
requests                  2.31.0           py38h06a4308_1  
s2n                       1.3.27               hdbd6064_0  
sacremoses                0.0.53             pyhd8ed1ab_0    conda-forge
safetensors               0.4.2            py38h0cc4f7c_0    conda-forge
scikit-image              0.19.2           py38h43a58ef_0    conda-forge
scikit-learn              1.0.2            py38h1561384_0    conda-forge
scipy                     1.9.1            py38h14f4228_0  
setuptools                68.2.2           py38h06a4308_0  
six                       1.16.0             pyhd3eb1b0_1  
slicer                    0.0.7              pyhd3eb1b0_0  
snappy                    1.1.10               h6a678d5_1  
sqlite                    3.41.2               h5eee18b_0  
sympy                     1.12             py38h06a4308_0  
tbb                       2021.8.0             hdb19cb5_0  
threadpoolctl             2.2.0              pyh0d69192_0  
tifffile                  2021.11.2          pyhd8ed1ab_0    conda-forge
timm                      0.9.16             pyhd8ed1ab_0    conda-forge
tk                        8.6.12               h1ccaba5_0  
tokenizers                0.10.3           py38hb63a372_1    conda-forge
toolz                     0.12.0           py38h06a4308_0  
torchaudio                2.2.1                py38_cu121    pytorch
torchtriton               2.2.0                      py38    pytorch
torchvision               0.17.1               py38_cu121    pytorch
tqdm                      4.65.0           py38hb070fc8_0  
transformers              4.11.1             pyhd8ed1ab_0    conda-forge
typing-extensions         4.9.0            py38h06a4308_1  
typing_extensions         4.9.0            py38h06a4308_1  
urllib3                   2.1.0            py38h06a4308_1  
utf8proc                  2.6.1                h27cfd23_0    anaconda
wheel                     0.41.2           py38h06a4308_0  
xz                        5.4.6                h5eee18b_0  
yaml                      0.2.5                h7b6447c_0  
yarl                      1.7.2            py38h0a891b7_2    conda-forge
zfp                       0.5.5                h9c3ff4c_8    conda-forge
zipp                      3.17.0           py38h06a4308_0  
zlib                      1.2.13               hd590300_5    conda-forge
zstd                      1.5.5                hc292b87_0  
ChengYuChuan commented 3 months ago

Hi @LetiP

After thorough investigation, I've found that the models other than Albef are functioning as expected without any issues. Specifically, I've tested and run different models, and they seem to be performing well.

Given this, I'd like to suggest that we close the ongoing issue related to Albef for now. It appears that the problem lies specifically with Albef, and since our other models are functioning correctly, it might be beneficial to focus my attention on resolving issues with other models, such as LLaVA.

Since I would like to apply mm-shap on LLaVA, I would like to open an new a issue about that.