Fine-tune it for multiclass or multilabel text classification

I have medical reports, and I try to predict the disease associated with each report : 1) both medical reports and disease to predict are written by humans -> mistakes, inconsistency in label names (same disease different ways to write it and reverse) 2) Should I use RobertaForSequenceClassifier or AutomodelForSequenceClassifier ? 3) what-s the best way to handle imperfect labels ? Embed them in the same roberta tokenized space, and predict the mean of the vector, or predict the whole vector (it becomes then a multilabel task). best

PlanTL-GOB-ES / lm-biomedical-clinical-es

Fine-tune it for multiclass or multilabel text classification #5