Closed Primtek closed 6 years ago
There are few ways, you can support a custom language as such.
I tried the same with a non-latin language using latin characters, Results were good for me but was just an experiment. https://medium.com/@souvikghosh_14630/building-a-rasa-chatbot-in-bengali-using-supervised-word-vectors-from-scratch-740ebede51ee
You can also use FastText from Facebook that has support for over 170 languages #869 , follow this thread to see how you can add FastText to Spacy's language model and then use it with Rasa's Spacy pipeline.
If you have raw language data, you can also train your vectors with SpaCy and then incorporate same as above. https://spacy.io/usage/training#section-basics
Just to add in the above comment, there's also this PR which, when gets merged, will give language agnostic NER too (intent classification is already there with tensorflow embedding). So you can use the following pipeline for intents and entity recognition for any language:
pipeline:
- name: "tokenizer_whitespace"
- name: "ner_crf"
- name: "intent_featurizer_count_vectors"
- name: "intent_classifier_tensorflow_embedding"
You can also try it before it gets merged. The code is in the lean-CRF branch. I tried it with a small Arabic dataset and worked perfectly (identifying correct intents and entities). It wouldn't be much of a problem to use it with Bahasa.
thanks everyone! .. will try