DAMO-NLP-SG / VideoLLaMA2

VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs
Apache License 2.0
752 stars 50 forks source link

Maybe a bug on data preprocess #70

Closed Weili-NLP closed 1 month ago

Weili-NLP commented 1 month ago

The following code in 'preprocess_plain' function always replace the human instruction with '\<image>'

https://github.com/DAMO-NLP-SG/VideoLLaMA2/blob/1232384b1d71c8b41517bfa29f75c5f1b2496dc6/videollama2/train.py#L559C17-L559C48

image

lixin4ever commented 1 month ago

It is not a bug. Pretraining stage is all about image/video-text alignment (rather than instruction following) and this is how it works.

Weili-NLP commented 1 month ago

Got it. Thanks for your reply.