Open zrjohnnyl opened 7 months ago
Hi @zrjohnnyl The SpanClassifier is there to further classify existing spans. It cannot perform NER by itself, but rather is there to further finegrain existing named entities, e.g. classify a person further into politican, musician, actor, ....
When you want to train overlapping NER models, you can consider training a NER model per entity type. Then you can define a person that can overlap with an organization, however this won't solve overlaps between the same entity type (person cannot overlap with person).
I guess that means SpanTagger will never make it to the main repo. I was hoping to avoid the two model approach because my dataset is quite large and I don't want to duplicate my data twice to train two models. Are you allowed to pass multitask_models into make_multitask_model_and_corpus, because there are others tasks and datasets besides that one.
Can I do something like.
multitask_corpus = Corpus(train=[parse_annotations(annotation) for annotation in train_annotations], dev=...., test=...)
multitask_model = MultitaskModel([model_1, model_2], use_all_tasks=True)
multitask_model, multicorpus = make_multitask_model_and_corpus([
(multitask_model, multitask_corpus, )
(model_3, corpus_3)
])
Question
Is the SpanClassifier the correct model for training a Named Entity Recognition (NER) model with overlapping entities? I trained a SpanClassifier using only NER labels, where some labels overlap within a single column. After one training epoch, the model achieved a micro average F1 score of over 95%. However, when attempting predictions, the model returns null.
I know the SpanClassifier is the replacement for the EntityLinkingModel, but I saw in another thread you need to use a SpanTagger model which inherits from EntityLinkingModel for predicting overlapping entities. Below is the code I used for the SpanClassifer. But I was using SequenceTagger but had to filtered out overlapping entities.