DAMO-NLP-SG / Video-LLaMA

[EMNLP 2023 Demo] Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding
BSD 3-Clause "New" or "Revised" License
2.77k stars 255 forks source link

Fix some bugs and adjust the code about number of frames. #106

Open CyrilSterling opened 1 year ago

CyrilSterling commented 1 year ago

Thanks for your contributions to VideoLLaMA! It is an impressive work. I have just fixed some details below.

  1. video_llama/runners/runner_base.py has a bug for resuming the model. The position of parameter strict=False seems to have a mistake.
  2. The parameter n_frms in the origin .yaml is not used in the code. To make it easier to adjust the number of frames, some changes were made to the dataset and builder.
  3. Fixed misrepresentation in readme.