Outputs are driven to zero when there's a strong imbalance

Hi,

I recently upgraded to PyTorch 2.x, using the latest code from the repository. While training a Named Entity Recognition (NER) classification model, I've noticed that when the majority of the tokens belong to a single class (e.g., class 0), the model converges and predicts only that majority class. This happens regardless of the batch size or learning rate I select.

Interestingly, this issue did not occur when using PyTorch 1.8. Has anyone else encountered this problem? Any insights or solutions would be greatly appreciated!

Thanks in advance for your help!

NormXU / ERNIE-Layout-Pytorch

Outputs are driven to zero when there's a strong imbalance #24