boheumd / MA-LMM

(2024CVPR) MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video Understanding
https://boheumd.github.io/MA-LMM/
MIT License
221 stars 26 forks source link

Use demo with finetuned checkpoint #10

Closed mvsoom closed 3 months ago

mvsoom commented 5 months ago

Is it possible to use a finetuned checkpoint in the demo, specifically for ActivityNet-QA?

Right now the API for finetuned models is kinda locked away in the train and eval scripts, which makes it hard to play with. I'm impressed by the zero-shot capabilities exhibited in the demo and would like to interact with a finetuned model to see how well it understands the time position embeddings (questions like: what happened in the last N frames or T seconds).

Can I adapt the Blip2VicunaInstruct_MALMM class to accept one of the finetuned checkpoint in saved_models.tar from the README?

boheumd commented 5 months ago

Hello, I have updated the demo.ipynb. So the model loads the default config from lavis/configs/models/blip2/blip2_instruct_vicuna7b.yaml. If you want to load a finetuned checkpoints, you need to first set the load_finetuned=True and specify the finetuned checkpoint path in the yaml config and reload the model again.