Closed mmayla closed 4 years ago
Hello! By default the useNoneFeature is false for arabic... https://github.com/axa-group/nlp.js/blob/master/lib/nlp/nlp-util.js#L428
You can activate it by putting
NlpUtil.useNoneFeature.ar = true
at the beginning of your code.
Is deactivated because I don't have any good and big dataset in arabic to test... so I didn't feel confident enough.
@jesus-seijas-sp Thank you for this, you helped me a lot :smiley:
Do you have a format for the dataset needed for testing? I may be able to help you with that
Hello! here you have an example: https://github.com/axa-group/nlp.js/tree/master/examples/benchmark
The json contains the intents, for each intent the utterances to train and the utterances to test. This corpus is an example in english, with nlp.js the accuracy is >98%, in other providers... well.. better check ;)
The None intent is special, does not haves data to train, only to test, and is the place to put sentences that can generate a false positive.
@jesus-seijas-sp I have been using... other providers and nlp.js for a while now and we settled on nlp.js... I know how nlp.js rock haha ;)
Most of my projects are in Arabic and we use nlp.js by default now in any project, so I am interested in improving nlp.js Arabic support and will create corpus like the one you provided in standard Arabic after I finish it what to do? do I add it to examples/benchmark and create a pull request? or do you suggest another way?
Also as you may know there are more than 25 dialects of Arabic used by people (Standard Arabic, Egyptian Arabic, Gulf Arabic, Tunisian Arabic, Levantine Arabic, ...) with standard Arabic being the least one got used in real-world but it mostly the standard language that all people can understand. Egyptian Arabic and Gulf Arabic are the most used ones, especially digitally.
What do you suggest for adding such dialects to nlp.js?
Thanks for the help :smiley:
Hello @MMayla , For each language three things should be implemented:
Closing as it was solved
Describe the bug I train the model on huge intents (+11000) in Arabic, all is working great, except for the fact that the model doesn't capture false positives in high rate with also high confidentiality score.
Utterance with no words in the entire intents list got captured with a high score, especially small utterances.
Note: I made sure that
useNoneFeature
is true (left it to default)Desktop (please complete the following information):