cliangyu / Cola

[NeurIPS2023] Official implementation of the paper "Large Language Models are Visual Reasoning Coordinators"
https://arxiv.org/abs/2310.15166
Other
100 stars 7 forks source link

Torch version may causes an error #4

Open ichimiya13 opened 3 months ago

ichimiya13 commented 3 months ago

Hi, thank you for your great work! Following your setup instruction, I ran the commands below.

conda env create -f cola.yml
cd ..
git lfs clone https://huggingface.co/OFA-Sys/ofa-large
python query/query_ofa.py --vlm-model-path ../OFA-large --data-dir ./datasets/ --dataset-name aokvqa --split val --vlm-task vqa --bs 128 --prediction-out ./predictions/aokvqa_ofa_vqa_val-da.json

Then, I got an error.

Traceback (most recent call last):
  File "/home/usr/Python/VQA/Cola/query/query_ofa.py", line 125, in <module>
    run_ofa(args)
  File "/home/usr/Python/VQA/Cola/query/query_ofa.py", line 99, in run_ofa
    gen = ofa_model.generate(
          ^^^^^^^^^^^^^^^^^^^
  File "/home/usr/anaconda3/envs/cola2/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/usr/anaconda3/envs/cola2/lib/python3.10/site-packages/transformers/generation/utils.py", line 1597, in generate
    model_kwargs = self._prepare_encoder_decoder_kwargs_for_generation(

It seems that the version of torch and transformers causes this error, could you tell me your environment? I tried on the following environment.

22.04.1-Ubuntu
conda 23.7.2
CUDA Version: 12.2
Driver Version: 535.171.04
NVIDIA GeForce RTX 4090
cliangyu commented 3 months ago

Can you try transformers v4.34? An alternative fix is to avoid encoder-decoder transformers. You may want to try llava_next.