MycroftAI / adapt

Adapt Intent Parser
Apache License 2.0
712 stars 155 forks source link

Tokenizer Internationalization - French #3

Open clusterfudge opened 8 years ago

clusterfudge commented 8 years ago

We should test to see if the EnglishTokenizer impl is sufficient for French, and if not, add an additional tokenizer. EnglishTokenizer is based on porter stemmer.

gcrieloue-main commented 8 years ago

For words such as "j'ajoute", I would like "ajoute" to be a word (a keyword actually) but it doesn't work.

I think french tokenizer is pretty similar to the english one except for this quote rule (which has exceptions such as words like "aujourd'hui").

penrods commented 6 years ago

I know this is really old, but I'm curious if this fits in to the "normalize()" approach I've implemented in English. Essentially I do a pre-pass on text that does things like coverts "it's" to "it is", simplifying parsing.

Does it make sense to do a French normalize() preprocessor that converts things like "j'amie" to "je amie"? This would live in: https://github.com/MycroftAI/mycroft-core/blob/dev/mycroft/util/lang/parse_fr.py#L1027

gcrieloue-main commented 6 years ago

Hello,

While it's and it is are both valid in English, Sadly "je aime" is not valid in French.

(And btw Amie is not a verb, it means friend)

Le jeu. 15 mars 2018 à 18:07, Steve Penrod notifications@github.com a écrit :

I know this is really old, but I'm curious if this fits in to the "normalize()" approach I've implemented in English. Essentially I do a pre-pass on text that does things like coverts "it's" to "it is", simplifying parsing.

Does it make sense to do a French normalize() preprocessor that converts things like "j'amie" to "je amie"? This would live in: https://github.com/MycroftAI/mycroft-core/blob/dev/mycroft/util/lang/parse_fr.py#L1027

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/MycroftAI/adapt/issues/3#issuecomment-373451760, or mute the thread https://github.com/notifications/unsubscribe-auth/AE9-PRNB7k1UT6fEZVi3QdopPojwD2i2ks5tep_egaJpZM4HBT-4 .

penrods commented 6 years ago

C'est la vie! There is a reason I shouldn't be the one implementing the French parsers. :)