Roboy / ravestate

✨ Ravestate is Roboy's reactive dialogue state library.
http://roboy.github.io/ravestate
BSD 3-Clause "New" or "Revised" License
25 stars 7 forks source link

Correctly fill nlp.prop_ner even with punctuation #112

Open ec-m opened 5 years ago

ec-m commented 5 years ago

If I insert vanilla and chocolate one each then nlp.prop_ner is filled correctly with (('one', 'CARDINAL'),). However, if I instead write vanilla and chocolate, one each(i.e., simply adding punctuation to the sentence) nlp.prop_ner stays empty.

josephbirkner commented 5 years ago

Thanks for writing this issue - Named Entity Recognition is definitely a big construction zone. It also fails mostly for NAME/LOCATION/ORGANIZATION if the input is not cased correctly. IMO this is also a big blocker for #96 . So we should really fix this asap!

josephbirkner commented 5 years ago

Fortunately, spacy provides easy extension mechanisms, especially for named entity recognition. If we use the en_medium NLP model, spacy provides word vectors, which we can match (with some tolerance) to named entities. For Cardinals, we can just detect cardinal words - that one should be easy to implement!