Open sunwhw opened 9 months ago
What's your datasets? Is your customed dataset?
Data: yes, and these data can be trained normally before modifying the model to support different frame. model: I also cut the len of data to 8, and it run normal and successfully so I think the data and model is normal, which makes me confused to fix the bug.
have you ever fix this problem?
Hi, have you ever encountered a problem when training models to support different sizes or different frames but always got stuck in the middle of the training? I checked the logs and it looks like there was a communication problem with Deepspeed during Gradient reduce?