NielsRogge / Transformers-Tutorials

This repository contains demos I made with the Transformers library by HuggingFace.
MIT License
9.47k stars 1.45k forks source link

Class imbalance for LiLT #314

Open Altimis opened 1 year ago

Altimis commented 1 year ago

Hi @NielsRogge . Thanks again for your work.

I'm facing class imbalance while training LiLt model. The imbalance is at the level of the sub-classes B-, I-, E-, S-. I have around 9 classes and for each class there are 4 sub classes which makes it 36 classes in total. Among these classes, there are classes that are highly represented in the data like (for example) S-CLASS_A and others that are less represented (like I-CLASS_B for example). How to handle this issue please ? Thanks in advance.

NielsRogge commented 1 year ago

I'd recommend using the weights argument of the cross-entropy loss: https://naadispeaks.blog/2021/07/31/handling-imbalanced-classes-with-weighted-loss-in-pytorch/

Altimis commented 1 year ago

Thank you @NielsRogge for your response. Wouldn't the 'other' class cause an issue in this weighting ? since it's dominant compared with the other classes.

NielsRogge commented 1 year ago

You need to set the weights according to the frequencies of the classes, as explained in the blog above