Official source for Spanish pretrained biomedical and clinical language models and resources made @ BSC-TEMU within the "Plan de las Tecnologías del Lenguaje" (Plan-TL).
Apache License 2.0
25
stars
2
forks
source link
Fine-tune it for multiclass or multilabel text classification #5
I have medical reports, and I try to predict the disease associated with each report :
1) both medical reports and disease to predict are written by humans -> mistakes, inconsistency in label names (same disease different ways to write it and reverse)
2) Should I use RobertaForSequenceClassifier or AutomodelForSequenceClassifier ?
3) what-s the best way to handle imperfect labels ? Embed them in the same roberta tokenized space, and predict the mean of the vector, or predict the whole vector (it becomes then a multilabel task).
best
I have medical reports, and I try to predict the disease associated with each report : 1) both medical reports and disease to predict are written by humans -> mistakes, inconsistency in label names (same disease different ways to write it and reverse) 2) Should I use RobertaForSequenceClassifier or AutomodelForSequenceClassifier ? 3) what-s the best way to handle imperfect labels ? Embed them in the same roberta tokenized space, and predict the mean of the vector, or predict the whole vector (it becomes then a multilabel task). best