Maybe a bug on data preprocess

DAMO-NLP-SG / VideoLLaMA2

VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs

Apache License 2.0

752 stars 50 forks source link

Closed Weili-NLP closed 1 month ago

Weili-NLP commented 1 month ago

The following code in 'preprocess_plain' function always replace the human instruction with '\<image>'

lixin4ever commented 1 month ago

It is not a bug. Pretraining stage is all about image/video-text alignment (rather than instruction following) and this is how it works.

Weili-NLP commented 1 month ago

Got it. Thanks for your reply.