boheumd / MA-LMM

(2024CVPR) MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video Understanding
https://boheumd.github.io/MA-LMM/
MIT License
221 stars 26 forks source link

Finetuning & zero3 supported? #1

Closed YasmineXXX closed 5 months ago

YasmineXXX commented 5 months ago

Thanks for sharing your great work!

I would like to know if the model supports finetuning on your provided checkpoint for downstream tasks. Also, does it support deepspeed zero3?

boheumd commented 5 months ago

Hello, to finetune the model on downstream tasks, I recommend directly using the pre-trained weights from InstructBLIP. Since the provided checkpoints are the finetuned weights for each dataset, they may not have generalizable ability. Please check the latest README for more instruction.

And our code is based on the LAVIS, which does not support deepspeed zero3.