allenai / longformer

Longformer: The Long-Document Transformer
https://arxiv.org/abs/2004.05150
Apache License 2.0
2.05k stars 276 forks source link

One hot encoding classes #241

Open PersianSpock opened 2 years ago

PersianSpock commented 2 years ago

I am using longformer for text classification and I have a data with 46 classes. Because the 46th class is small in splitting it moves to tet and it doesn't exist in train. now my onehotencoder has 45 classes but after all I need to train my model with 46 classes. What can be done?

from sklearn.preprocessing import OneHotEncoder
num_labels = 46

#creating instance of one-hot-encoder
encoder = OneHotEncoder(handle_unknown='ignore')

#perform one-hot encoding on 'team' column 
e_sh['one_hot_labels'] =list(encoder.fit_transform(e_sh[['Label']]).toarray())