Open ec-m opened 5 years ago
Thanks for writing this issue - Named Entity Recognition is definitely a big construction zone. It also fails mostly for NAME
/LOCATION
/ORGANIZATION
if the input is not cased correctly. IMO this is also a big blocker for #96 . So we should really fix this asap!
Fortunately, spacy provides easy extension mechanisms, especially for named entity recognition. If we use the en_medium
NLP model, spacy provides word vectors, which we can match (with some tolerance) to named entities. For Cardinals, we can just detect cardinal words - that one should be easy to implement!
If I insert
vanilla and chocolate one each
thennlp.prop_ner
is filled correctly with(('one', 'CARDINAL'),)
. However, if I instead writevanilla and chocolate, one each
(i.e., simply adding punctuation to the sentence)nlp.prop_ner
stays empty.