boheumd / MA-LMM

(2024CVPR) MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video Understanding
https://boheumd.github.io/MA-LMM/
MIT License
221 stars 26 forks source link

Run long video rightly #8

Closed shuyansy closed 3 months ago

shuyansy commented 5 months ago

Hi, thanks for your great work! Now I am running the demo and want to expand the frames (larger than 20). what parameters should I just?the one in the load_video function or the one in the yaml? Thanks for your time!

shuyansy commented 5 months ago

Also, I want to know how to adjust the "memory_bank_length: 10 num_frames: 20 " When I expand them, the errors will occur. 截屏2024-04-30 下午5 27 40

boheumd commented 5 months ago

Hello, I have updated the demo.ipynb in the latest version. You can easily specify the memory_bank_length and num_frames when loading the model. Please note that, every time you change the memory_bank_length or num_frames, you need to reload the model again.

shuyansy commented 5 months ago

Hi, thanks for the update.However, when I want to load more frames in model , this errors occurs. Do you have any ideas?

截屏2024-05-01 下午12 38 27 截屏2024-05-01 下午12 38 35

boheumd commented 5 months ago

Hello, please check for the latest code update. Currently the max_num_frames is set to 120 by default. If you need to test model on long videos, you need to set the max_num_frames to a larger value in lavis/configs/models/blip2/blip2_instruct_vicuna7b.yaml or lavis/configs/models/blip2/blip2_instruct_vicuna13b.yaml

shuyansy commented 5 months ago

It works well now for me. Thanks again for the great work and your effort!