flairNLP / flair

A very simple framework for state-of-the-art Natural Language Processing (NLP)
https://flairnlp.github.io/flair/
Other
13.98k stars 2.1k forks source link

How to improve NER #837

Closed DecentMakeover closed 5 years ago

DecentMakeover commented 5 years ago

Hi

Any suggestions on How i could improve NER perfomance, i am training it on custom data.Apart from trying out different Embeddings, What else can i try?

Thanks in advance, and any suggestions would be helpful.

Thanks

bheinzerling commented 5 years ago

In order from biggest potential for improvement to lowest potential:

  1. annotate NER training data, especially if your custom data is only a few hundred or thousand instances
  2. find a larger annotated NER dataset that is similar to your custom data, train an NER model on that, then finetune the model on your custom data
  3. combine different types of embeddings (BERT, ELMo, Flair, BytePairEmbeddings) and input representations (words, subwords, chars, token shapes)
  4. data augmentation: add variations of the instances in your custom data. For example, introduce spelling mistakes, randomly add or remove words, or remove some clauses from sentences.
  5. train embeddings on texts that are similar to your custom data, then train an NER model using those embeddings (maybe combining with more embeddings as in 3.)
DecentMakeover commented 5 years ago

Hey thanks a lot let me look into this.

DecentMakeover commented 5 years ago

@bheinzerling just one more question ,

Named Entity Recognition | English | Conll-03 | 93.18 (F1) | 92.22 (Peters et al., 2018)

The above is from flair accuracy benchmarks, have they released the model from this result?

Thanks