💬 Open source machine learning framework to automate text- and voice-based conversations: NLU, dialogue management, connect to Slack, Facebook, and more - Create chatbots and voice assistants
Description of Problem:
Right now custom entities can only use pos features from spacy and a handful of simple features. This seems to be in contrast to the flexibility and power of the other pipeline components which can take advantage of any combination of built-in and custom featurizers. Ideally, there would be a way to pass ner_features to the CRFEntityExtractor. In particular, this would let you train NER that used word/token vectors straight from spacy (or other pretrained models)
Overview of the Solution:
CRFEntityExtractor needs to additionally check for ner_features on the message and add them to the feature dict it passes to sklearn_crfsuite.
There need to be NER featurizer classes added
Examples (if relevant):
The skeleton of this (both adding a spacy-based featurizer and making CRFEntityExtractor use ner_features) is implemented in this PR
https://github.com/RasaHQ/rasa/pull/4187
Please let me know if this looks like a useful feature and if this PR is heading in the right direction.
Still necessary:
Add tests
Extend Featurizer to also have _combine_with_existing_ner_features
Validate that having default spacy tokens noticeably improves NER for a sample task
Make spacy only optionally add to ner_features
Replace the hard-coded lambda functions in CRFEntityExtractor with a simple Featurizer
Description of Problem: Right now custom entities can only use
pos
features fromspacy
and a handful of simple features. This seems to be in contrast to the flexibility and power of the other pipeline components which can take advantage of any combination of built-in and customfeaturizers
. Ideally, there would be a way to passner_features
to theCRFEntityExtractor
. In particular, this would let you train NER that used word/token vectors straight from spacy (or other pretrained models)Overview of the Solution:
CRFEntityExtractor
needs to additionally check forner_features
on the message and add them to the feature dict it passes tosklearn_crfsuite
.Examples (if relevant): The skeleton of this (both adding a
spacy
-based featurizer and makingCRFEntityExtractor
usener_features
) is implemented in this PR https://github.com/RasaHQ/rasa/pull/4187 Please let me know if this looks like a useful feature and if this PR is heading in the right direction.Still necessary:
Featurizer
to also have_combine_with_existing_ner_features
spacy
only optionally add toner_features
CRFEntityExtractor
with a simpleFeaturizer
Definition of Done: