This pull request addresses the integration of the token classification model of Turna encoder and BERTurk.
Changes:
NERDatasets has been updated to handle label outputs as well as text outputs.
T5ForSequenceClassification has been renamed to T5ForClassification to include token classification, and the forward function has been updated for token classification.
TrainerForClassification has been updated to reflect changes in T5ForClassification and AutoModelForTokenClassification.
DataCollatorForTokenClassification has been added to the trainer for token classification.
SeqEval has been updated to handle token classification output with -100.
Added ner_classification_base.py, an implementation of fine-tuning BERTurk on token classification using pure transformers.
Added ner_classification.py, an implementation of fine-tuning BERTurk on token classification with the transformers trainer, our dataset, and evaluators.
The implementation was tested on BERTurk and mT5-small with the following commands:
This pull request addresses the integration of the token classification model of Turna encoder and BERTurk.
Changes:
NERDataset
s has been updated to handle label outputs as well as text outputs.T5ForSequenceClassification
has been renamed toT5ForClassification
to include token classification, and theforward
function has been updated for token classification.TrainerForClassification
has been updated to reflect changes inT5ForClassification
andAutoModelForTokenClassification
.DataCollatorForTokenClassification
has been added to the trainer for token classification.SeqEval
has been updated to handle token classification output with -100.ner_classification_base.py
, an implementation of fine-tuning BERTurk on token classification using pure transformers.ner_classification.py
, an implementation of fine-tuning BERTurk on token classification with the transformers trainer, our dataset, and evaluators.The implementation was tested on BERTurk and mT5-small with the following commands: