axa-group / nlp.js

An NLP library for building bots, with entity extraction, sentiment analysis, automatic language identify, and so more
MIT License
6.28k stars 621 forks source link

Possible Issue with Dynamically Imported Corpus with NER #1136

Closed SteveRepp closed 2 years ago

SteveRepp commented 2 years ago

Describe the bug This bug appears to be present only in instances of a corpus being imported via addCorpus() with a linked ner json file. During the walkMember in evaluator, the following error presents:

TypeError: Cannot read properties of undefined (reading 'option')\n at Evaluator.walkMember (C:\Server\node_modules\@nlpjs\evaluator\src\evaluator.js:184:17)

To Reproduce Steps to reproduce the behavior: Add a new corpus file via the manager.addCorpus function as follows:

MyNLPManager.addCorpus('airtravel.json')

Next: define a corpus as follows: airtravel.json { "name": "Air Travel", "locale": "en", "contextData": "./city.json", "data": [ { "intent": "user.wantstotravel", "utterances": [ "I want to fly to @city", "I need to fly out of @city" ], "answers": [ "{{ city }} has {{ _data[entities.city.option].airport }}" ] } ], "entities": { "city": { "options": { "orlando": ["orlando"], "tampa": ["tampa"], "atlanta": ["atlanta"] } } } }

Then add the linked entity file as follows: city.json { "orlando": { "state": "Florida", "airport": "Orlando International", "tz": "EST" }, "tampa": { "state": "Florida", "airport": "Tampa International", "tz": "EST" }, "atlanta": { "state": "Georgia", "airport": "Atlanta International", "tz": "EST" } }

Finally attempt to train and process. The error will appear while the entity link is correctly defined. It finds the entity file, but looses something after.

Expected behavior Corpus file ingested along with linked entity file.

Screenshots NA

Desktop (please complete the following information):

Additional context After a bit of playing in evakuator.js, I am seeing that during the walking, when it gets to the node.name for MemberExpression, it is undefined. This is as the node var stands on line 346. console output below:

Identifier | city MemberExpression | undefined MemberExpression | undefined Identifier | entities

jesus-seijas-sp commented 2 years ago

Hello!

The problem here is that node-nlp library looks for retrocompatibility, so the entities are noted as "%entity%", so "I want to go to orlando" gets converted to "I want to go to %city%" not to "I want to go to @city". The utterances at the corpus are trained using the new format with @ in front. So two solutions:

  1. Change the corpus, use the old format %city%
  2. Change the entityPreffix and entitySuffix settings of the ner to undefined to become default (prefix @, suffix undefined)
const { NlpManager } = require('node-nlp');
const corpus = require('./corpus.json');

(async () => {
  const nlpManager = new NlpManager({ languages: ['en'] });
  nlpManager.nlp.ner.settings.entityPreffix = undefined;
  nlpManager.nlp.ner.settings.entitySuffix = undefined;
  nlpManager.addCorpus(corpus);
  await nlpManager.train();
  const result = await nlpManager.process('I want to go to orlando');
  console.log(result);
})();
SteveRepp commented 2 years ago

Thanks Jesus! Either solution seems to kills the process(utterance) function when is executes after training. I am not finding anything in the model on airport at all, so I am wondering if the entities are making it in. I see this at the bottom:

"slotManager": { "user.wantstotravel": { "city": { "intent": "user.wantstotravel", "entity": "city", "mandatory": false, "locales": {} } } }

I am going to continue to look at this is deep detail. Thanks!

jesus-seijas-sp commented 2 years ago

Well, it can happen if you're using a huge database of airports, because that means that for each word of the utterance it will compare using levenshtein with each city, because by default nlp.js allows users to have typos when typing, example "Bracelona" will match with "Barcelona".

If this is the case, you're lucky, because in examples you have an example of huge-ner with no levenshtein precisely using airports: https://github.com/axa-group/nlp.js/tree/master/examples/06-huge-ner

Basically the fix is, in the NER settings, put the threshold value to 1. https://github.com/axa-group/nlp.js/blob/master/examples/06-huge-ner/conf.json

Here the full conversation of the issue that generated this example: https://github.com/axa-group/nlp.js/issues/778

SteveRepp commented 2 years ago

Thanks for the assistance. Closing issue.