PKU-YuanGroup / Video-LLaVA

【EMNLP 2024🔥】Video-LLaVA: Learning United Visual Representation by Alignment Before Projection
https://arxiv.org/pdf/2311.10122.pdf
Apache License 2.0
2.88k stars 207 forks source link

Is it possible to train with languages other than English, and are the 8 frames sampled uniformly across different video lengths? #177

Open YoungjaeDev opened 2 months ago

YoungjaeDev commented 2 months ago

Hello,

I am interested in training the model with languages(Korean) other than English. Is this feasible? I noticed that the Word Embedding Layer is frozen according to the paper. Does this affect the ability to train with different languages?

Additionally, I am curious about the 8-frame sampling process. Are 8 frames uniformly sampled regardless of the video length (short, mid, long)?

Thank you for your assistance!