Closed Weili-NLP closed 1 month ago
The following code in 'preprocess_plain' function always replace the human instruction with '\<image>'
https://github.com/DAMO-NLP-SG/VideoLLaMA2/blob/1232384b1d71c8b41517bfa29f75c5f1b2496dc6/videollama2/train.py#L559C17-L559C48
It is not a bug. Pretraining stage is all about image/video-text alignment (rather than instruction following) and this is how it works.
Got it. Thanks for your reply.
The following code in 'preprocess_plain' function always replace the human instruction with '\<image>'
https://github.com/DAMO-NLP-SG/VideoLLaMA2/blob/1232384b1d71c8b41517bfa29f75c5f1b2496dc6/videollama2/train.py#L559C17-L559C48