PKU-YuanGroup / Video-LLaVA

【EMNLP 2024🔥】Video-LLaVA: Learning United Visual Representation by Alignment Before Projection
https://arxiv.org/pdf/2311.10122.pdf
Apache License 2.0
3.02k stars 220 forks source link

Inference model path unclear #161

Open Ali2500 opened 6 months ago

Ali2500 commented 6 months ago

I'm trying to run inference using eval_qa_msvd.sh by providing a model which I trained myself as the model_path. However, it seems that the program flow depends a lot on the name of the directory as indicated by the if-else block in model/builder.py.

When I run bash eval_qa_msvd.sh with model_path set to an arbitrarily named directory containing your pretrained checkpoints from HF, I get the following error:

  File "/home/videonet/videollava/eval/video/run_inference_video_qa.py", line 66, in get_model_output
    video_tensor = video_processor.preprocess(video, return_tensors='pt')['pixel_values'][0].half().to(args.device)

AttributeError: 'NoneType' object has no attribute 'preprocess'

It looks like the video_processor is None. I then artificially added "llava" to the name of the directory and it seems to work.

  1. Could you suggest a better way of constructing this if-else block that doesn't depend on the directory name? (maybe check the config file?).

  2. Is there any use for the else part of the block which applies when 'llava' is not part of the directory name?

  3. Is there any use for the case where mpt is part of the model name?

  4. When finetuning using LoRA, what should the model_base be set to? Should it be lmsys/vicuna-7b-v1.5 since the LLM is unchanged with LoRA finetuning?