⭐ [Feat] Support finetuning from previously saved checkpoint

Dear authors:

Thank you for sharing the amazing work! I believe this would be an important base model for a lot of future works involving Video Language Models.

I'm currently trying to finetune on my own dataset of very small size, hence I would like to continue finetuning based on the chat model you provided. Current code, however, only fits the case where finetuning is carried out to the base model . (to my knowledge?)

I made a little adjustment to the code and support this case. If you don't think it is a good practice or you have better way to achieve this, please view this pull request as a small issue opened.

DAMO-NLP-SG / VideoLLaMA2

⭐ [Feat] Support finetuning from previously saved checkpoint #13