Closed qwertyuu closed 4 years ago
@qwertyuu I ran your case against 4.0.0-rc19. The results are better.
const nlp = require('node-nlp');
const locale = "fr";
const entityText = "des bizarre de problemes";
const entityName = "event";
const nlpManager = new nlp.NlpManager({ languages: [locale] });
nlpManager.addNamedEntityText(
entityName,
"problems",
[locale],
[entityText],
);
nlpManager.addDocument(locale, `J'ai ${entityText}`, "problems");
nlpManager.train().then(() => {
nlpManager.save('reproduce-model.nlp', false);
nlpManager.process(locale, entityName)
.then((result) => console.log(result));
// { intent: 'None', score: 1 }
// Good 👍
nlpManager.process(locale, `J'ai ${entityName}`)
.then((result) => console.log(result));
// [ { intent: 'problems', score: 0.6354737923397036 },{ intent: 'None', score: 0.3645262076602964 } ]
// This looks okay, since the corpus is very small
nlpManager.process(locale, `J'ai ${entityText}`)
.then((result) => console.log(result));
// { intent: 'problems', score: 1 }
// Exactly what is expected. Perfect. 👍
nlpManager.process(locale, entityText)
.then((result) => console.log(result));
// { intent: 'None', score: 1 },
// Did not match the intent at all, but matched the entity correctly. This seems acceptable, although maybe we would want to have the intents containing this entity get a certain weight.
nlpManager.process(locale, "J'ai hello")
.then((result) => console.log(result));
// [ { intent: 'problems', score: 0.6354737923397036 }, { intent: 'None', score: 0.3645262076602964 } ]
// Looks okay
});
Also, the stemDict found in the model is sane:
"stemDict": {
"ai,bizarr,de,j,problem": {
"intent": "problems",
"domain": "default"
}
@jesus-seijas-sp Was there a fix in between 3.10 and 4.0 that explains this?
Hello, is not that between 3.10 and 4.0 is there a fix, is more that 4.0 was totally rebuilt, so many things changes. Right now seems to be stable and released as 4.0.1, right now I'm working on the documentation and examples, you can find here a QuickStart guide of the 4.x: https://github.com/axa-group/nlp.js/blob/master/docs/v4/quickstart.md The QuickStart is more a "tutorial" of the different features, and includes links to the code of every example. Tomorrow I will write a guide but about how to compile it for web and react native (the bundle size was reduced from 3MB at version 3.x down to 125KB at version 4.x). After this guide I will revise the issues, this one included.
I confirm that in v4 this problem is pretty much inexistant. I think (I have not tested it) there might still be a vulnerability if a user purposely types the full entity name in the utterance as-is (ex.: %event%
) but this is a little far-fetched and can be either ignored or sanitized in some way.
Thanks for the nice library Jesús, very nice work again.
Describe the bug
If you write the name of an entity in the utterance, you actually score almost perfect score in an intent that uses it in its source utterances.
To Reproduce Example file to reproduce. I have commented near each scenario what is expected when the result is bad.
We can also see, in the resulting
model.nlp
, that the stemDict is not sane at all:the
event
stem should not really be there at all. In no way I want 'event' to trigger any stemming. 'event' also comes around in many other places in the resulting model.nlp where it does not belong (outside of the namedEntities zone)Expected behavior
I think one part of the expected behavior is that the stemDict should contain something like:
event
should appear nowhere but in the "namedEntities" zone of the model. Only examples with actual source utterance entity text should be used to generate features and stems.Desktop (please complete the following information):
Additional context This causes many high level problems, because in the
process
mechanism, nlp.js callsgenerateEntityUtterance
which adds placeholder text for entity names in theoptionalUtterance
const. This then triggers some unwanted intents since the generated utterance now contains the entity name, which triggers this bug above and wrecks the outcome totally in some cases.