antoyang / FrozenBiLM

[NeurIPS 2022] Zero-Shot Video Question Answering via Frozen Bidirectional Language Models
https://arxiv.org/abs/2206.08155
Apache License 2.0
156 stars 24 forks source link

[Import Error] with demo_videoqa.py #13

Closed BigJoon closed 1 year ago

BigJoon commented 1 year ago

python demo_videoqa.py --combine_datasets msrvtt --combine_datasets_val msrvtt --suffix="." --max_tokens=256 --ds_factor_ff=8 --ds_factor_attn=8 --load=checkpoints/frozenbilm_msrvtt10p.pth --msrvtt_vocab_path=data/MSRVTT-QA/vocab.json --question_example "what is that dog doing?" --video_example ./angry_cute_dog.mp4

I downloaded all the data and checkpoints files. Also i downloaded transformers library from hugging face. But... plz.. check my error message..

ImportError: cannot import name 'GreedySearchOutput' from 'transformers.generation_utils'(FrozenBiLM/transformers/src/transformers/generation_utils.py)

what version of transformers library are u using?

antoyang commented 1 year ago

This is specified here: https://github.com/antoyang/FrozenBiLM/blob/main/requirements.txt

BigJoon commented 1 year ago

thanks! now i faced another problem now..

'File "demovideoqa.py", line 71, in main backbone, = clip.load("ViT-L/14", download_root=MODEL_DIR, device=device) AttributeError: module 'clip' has no attribute 'load''

why is this error happens..?Is there any problem.. with library clip??

OK. This one was solved using this one

pip install git+https://github.com/openai/CLIP.git

Now.. facing another one. Is there any way to do this?

image

RuntimeError: expected scalar type Half but found Float

antoyang commented 1 year ago

This is probably because the input feature is extracted/stored in half precision. Just doing video = video.float() should fix it.

BigJoon commented 1 year ago

Ohhhhhhhhhh Myyyyyyy Goddd My lover @antoyang !!! Suuuuper thx!! I got my Top 5 answers and scores with cute angry puppy videos!!