PKU-YuanGroup / Video-LLaVA

【EMNLP 2024🔥】Video-LLaVA: Learning United Visual Representation by Alignment Before Projection
https://arxiv.org/pdf/2311.10122.pdf
Apache License 2.0
3.04k stars 220 forks source link

when training new model, I got stuck in the middle of the training #107

Open sunwhw opened 9 months ago

sunwhw commented 9 months ago

Hi, have you ever encountered a problem when training models to support different sizes or different frames but always got stuck in the middle of the training? I checked the logs and it looks like there was a communication problem with Deepspeed during Gradient reduce?

image
LinB203 commented 9 months ago

What's your datasets? Is your customed dataset?

sunwhw commented 9 months ago

Data: yes, and these data can be trained normally before modifying the model to support different frame. model: I also cut the len of data to 8, and it run normal and successfully so I think the data and model is normal, which makes me confused to fix the bug.

ciroimmobile commented 5 months ago

have you ever fix this problem?