axa-group / nlp.js

An NLP library for building bots, with entity extraction, sentiment analysis, automatic language identify, and so more
MIT License
6.28k stars 621 forks source link

Intent recognition in hebrew #700

Closed benag closed 1 year ago

benag commented 4 years ago

I don't manage to use the framework to identify intents in hebrew, here is my code:

this.manager.addDocument('he', 'האם אני מכוסה', 'coverage.cover'); this.manager.addDocument('he', 'למה אני מכוסה', 'coverage.cover');

await this.manager.train(); this.manager.save(); const response = await this.manager.process('he', text);

Can you kindly show me how this can be done?

Thank you,

jesus-seijas-sp commented 4 years ago

Hello,

Hebrew has no native support, even with that it should work.

const { NlpManager } = require('node-nlp');

(async() => {
  const manager = new NlpManager({ languages: ['he'] });
  manager.addDocument('he', 'האם אני מכוסה', 'coverage.cover');
  manager.addDocument('he', 'למה אני מכוסה', 'coverage.cover');

  await manager.train();
  const response = await manager.process('he', 'מכוסה');
  console.log(response);
})();

This is working.

On the other hand, I did a corpus using google translate, you can find the corpus here: https://github.com/axa-group/nlp.js/blob/master/examples/13-languages/corpora/corpus-he.json

And the code for measuring here: https://github.com/axa-group/nlp.js/blob/master/examples/13-languages/hebrew/12-benchmark.js

As hebrew has no native support in nlp.js the results does not get to 80% or upper of Precision, those are the results of the measurement:

image

So a 71.48% of Precision.

benag commented 4 years ago

Thanks, is this code use BERT ?

jesus-seijas-sp commented 4 years ago

No, this is not using BERT, is tokenizing by word. If you want to use BERT is more complex, and you need a VOCAB file from an hebrew training, and you have two options: connect to a BERT server done in python, or use the included BERT tokenizer in javascript, but is not documented right now.

aigloss commented 1 year ago

Closing due to inactivity. Please, re-open if you think the topic is still alive.