antoyang / FrozenBiLM

[NeurIPS 2022] Zero-Shot Video Question Answering via Frozen Bidirectional Language Models
https://arxiv.org/abs/2206.08155
Apache License 2.0
153 stars 23 forks source link

Error on zero-shot VQA #11

Closed tobyperrett closed 1 year ago

tobyperrett commented 1 year ago

Hi. Thanks for providing code! I'm having the same issue as #3 on the VQA demo. I have the Microsoft deberta-v2-xlarge ( https://huggingface.co/microsoft/deberta-v2-xlarge ) downloaded from huggingface in a folder called transformers_cache. I've set the TRANSFORMERS_CACHE environment variable to point at it (if I remove this, it complains that deberta is missing, so I assume this part is correct). Do you have any idea why it might be failing?

The command I'm running is:

python demo_videoqa.py --combine_datasets msrvtt --combine_datasets_val msrvtt \ --suffix="." --max_tokens=256 --ds_factor_ff=8 --ds_factor_attn=8 \ --load=models/frozenbilm.pth --msrvtt_vocab_path=data/MSRVTT-QA/vocab.json \ --question_example question --video_example test.mp4 --device='cpu'

And the error is:

Traceback (most recent call last): File "demo_videoqa.py", line 170, in main(args) File "/user/work/tp8961/conda_envs/frozenbilm_env/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context return func(*args, *kwargs) File "demo_videoqa.py", line 32, in main tokenizer = get_tokenizer(args) File "/user/work/tp8961/FrozenBiLM/model/init.py", line 96, in get_tokenizer tokenizer = DebertaV2Tokenizer.from_pretrained( File "/user/work/tp8961/conda_envs/frozenbilm_env/lib/python3.8/site-packages/transformers/tokenization_utils_base.py", line 1788, in from_pretrained return cls._from_pretrained( File "/user/work/tp8961/conda_envs/frozenbilm_env/lib/python3.8/site-packages/transformers/tokenization_utils_base.py", line 1923, in _from_pretrained tokenizer = cls(init_inputs, **init_kwargs) File "/user/work/tp8961/conda_envs/frozenbilm_env/lib/python3.8/site-packages/transformers/models/deberta_v2/tokenization_deberta_v2.py", line 145, in init self._tokenizer = SPMTokenizer(vocab_file, split_by_punct=split_by_punct, sp_model_kwargs=self.sp_model_kwargs) File "/user/work/tp8961/conda_envs/frozenbilm_env/lib/python3.8/site-packages/transformers/models/deberta_v2/tokenization_deberta_v2.py", line 296, in init spm.load(vocab_file) File "/user/work/tp8961/conda_envs/frozenbilm_env/lib/python3.8/site-packages/sentencepiece/init.py", line 367, in Load return self.LoadFromFile(model_file) File "/user/work/tp8961/conda_envs/frozenbilm_env/lib/python3.8/site-packages/sentencepiece/init.py", line 171, in LoadFromFile return _sentencepiece.SentencePieceProcessor_LoadFromFile(self, arg) RuntimeError: Internal: src/sentencepiece_processor.cc(890) [model_proto->ParseFromArray(serialized.data(), serialized.size())]

antoyang commented 1 year ago

Could this be because the tokenizer file is not downloaded?

tobyperrett commented 1 year ago

Thanks for the quick reply! This is the contents of transformers_cache/deberta-v2-xlarge: config.json pytorch_model.bin README.md spm.model tf_model.h5 tokenizer_config.json

I also have the MSRVTT vocab file downloaded data/MSRVTT-QA/vocab.json (and I've tried the vocab1000.json as well).

antoyang commented 1 year ago

If these files are all properly downloaded (not corrupted), something else I could think of is to make sure the transformers library version matches the one used here. It seems to be an error specific to the loading of the tokenizer.

tobyperrett commented 1 year ago

Some of them were corrupted - I downloaded on a different computer, copied them over and it works now. Thanks!

Yuxinn-J commented 1 year ago

Thanks for the quick reply! This is the contents of transformers_cache/deberta-v2-xlarge: config.json pytorch_model.bin README.md spm.model tf_model.h5 tokenizer_config.json

I also have the MSRVTT vocab file downloaded data/MSRVTT-QA/vocab.json (and I've tried the vocab1000.json as well).

Hello! Could I ask where to download the vocab file (vocab.json/vocab1000.json)? would it be possible to kindly provide me with a link to it? Thanks!

antoyang commented 1 year ago

Hi, as stated in the downloading instructions of the readme, you can find vocab here: https://drive.google.com/drive/u/3/folders/1ED2VcFSxRW9aFIP2WdGDgLddNTyEVrE5.

Yuxinn-J commented 1 year ago

Thanksss! also for this GREAT work!