Open claytongulick opened 4 years ago
Hello! Now in the version 4 all is splitted into plugins and pipelines. The entity extraction pipeline is located here, but can be modified by a configuration file: https://github.com/axa-group/nlp.js/blob/master/packages/ner/src/ner.js#L54
[
'.decideRules',
'extract-enum',
'extract-regex',
'extract-trim',
'extract-builtin',
],
So as you can see it execute those plugins in this order: extract-enum, extract-regext, extract-trim and extract-builtin. In fact for extract-builtin you can right now decide to register the plugin for Microsoft Recognizers or the one for duckling that are located in these packages:
https://github.com/axa-group/nlp.js/tree/master/packages/builtin-duckling https://github.com/axa-group/nlp.js/tree/master/packages/builtin-microsoft
Even more, you can decide by language which plugins to use or which pipelines to use. You can see a clean example of how to build an extractor plugin taking a look into the regex extractor: https://github.com/axa-group/nlp.js/blob/master/packages/ner/src/extractor-regex.js
That means that if you register a plugin with the same name, it will replace the existing one, so you can replace completely how to do the NER, regex, trim and builtin. Also means that you can modify the pipelines and remove the steps that you don't need and add new ones (put the name of the plugin to execute, and register the plugin).
About how to do that, as we want to be retrocompatible with the version 3.x that used the builtin of microsoft or duckling based on a configuration passed to the NlpManager class, you can see how we did it in version 4 here: https://github.com/axa-group/nlp.js/blob/master/packages/node-nlp/src/nlp/nlp-manager.js#L49
Is it possible to train a customer NER, for example, if I want this question answered.
"Tell me about %attribute% of Tesla Model S."
%attribute% could be a long list of things such as [color, seats, weight, ...] but not limited at the time
How do I create an entity extractor specifically for that and pass it down to the NlpManager?
In fact you could use a trim rule where you device words before/after your word and then it is trimmed out of the string. You can also ztyr to use a enum entity AND a trim rule ... The first should give better matching for "known words" and the other one would still allow "Unknown" words
I will add tests and check that once my PRs are merged
Is your feature request related to a problem? Please describe. I'm working on identifying certain medical terms and phrases which are very specific to the medical industry. I need to be able to create NER models.
Describe the solution you'd like nlp.js has some great built-in entity recognizers, date, email, etc... but it's not clear (to me, as a beginner) on how to build your own that will work with the framework. I'd like to see some clear examples on how to create these models, how to label documents to train the recognition engine, and how to use/save the trained models.