DeanCording / node-red-contrib-ecolect

A Node Red node that is a wrapper around the Ecolet natural language matching library.
MIT License
3 stars 0 forks source link

Using non-english phrases #2

Open bartbutenaers opened 6 years ago

bartbutenaers commented 6 years ago

Hey Dean,

me again :-)

What I really like about this node, is the fact that it can extract dates/numbers/times... from a phrase. For example some available possibilities for time:

image

However I'm speaking Dutch at home, so I would like my Ecolect node to understand my weird native language. For example instead of 'activate kitchen light in two days and four hours', I would like to interpret 'activeer keuken licht in twee dagen en vier uren'.

I see in your code that you already added some locale related stuff:

const en = require('ecolect/language/en');
const any = require('ecolect/values/any');

Am I right that following steps need to be executed to accomplish my goal ??

  1. I have to implement a series of Dutch locale files, similar to the english files.
  2. I need to create a pull request for the ecolect project, to add the new locale files
  3. Does the node-red-contrib-collect config screen needs to be extended with a dropdown (that automatically displays all the available locales) or does the any statement in your code run through all the available locales ?
  4. I assume your package.json file doesn't need to be updated for the new ecolect dependency, since you use the '*' version.

Thanks ! Bart

DeanCording commented 6 years ago

Hey Bart,

Yes, you are essentially correct with the steps required. The matcher uses the Jaro-Winkler Distance to determine the match between words. This essentially just looks at character differences and is language agnostic. Ecolect also uses the Porter Stemmer to identify the root of words to deal with variations of tense and such. The stemmer implementation Ecolect uses supports English, French and German. However there is a different implementation that supports Dutch.

I've been talking with the Ecolect developer about how to improve Ecolect and he is quite receptive to making changes. The changes to my node would be trivial.

Dean

bartbutenaers commented 6 years ago

Hey Dean,

It seems I misunderstood the language mechanism.

I thought that I simply needed, for every english file in ecolect: image That I had to create a corresponding dutch file (with dutch) keywords. But that is not correct if I understand you correctly ...

So currently Ecolect can only offer the following languages at this moment ? image And that the Ecolect developer would have to call another implementation (e.g. like this one) to be able to support other languages??

Didn't realize my question would require such a large change ;-(

Thanks for having (at least) a look at it !! Bart

DeanCording commented 6 years ago

Actually the Ecolect developer has done a very good job at making it possible to support other languages. He has built a framework that ties together other language processing tools. The talisman library he uses for English also has tools for French and German, but it is simply to add other tool libraries, such as Natural or lunr, to support other languages if they exist. Even the approach he uses for English can be changed for each language if it is not appropriate for that language.