RasaHQ / rasa

💬 Open source machine learning framework to automate text- and voice-based conversations: NLU, dialogue management, connect to Slack, Facebook, and more - Create chatbots and voice assistants
https://rasa.com/docs/rasa/
Apache License 2.0
18.86k stars 4.63k forks source link

how can we add Stanford CoreNLU Arabic language support to Rasa-core? or is there any other way to add support for arabic language? #1074

Closed nileshgarg closed 6 years ago

nileshgarg commented 6 years ago
**Rasa NLU version**: master branch **Operating system** (windows, osx, ...): windows **Content of model configuration file**: ```yml ``` **Issue**:
akelad commented 6 years ago

You can try the tensorflow embedding pipeline, this is language independent so should theoretically also work for arabic for intent classification. I'm not super familiar with Stanford CoreNLP, but you'd have to build a custom component from these examples https://github.com/RasaHQ/rasa_nlu/tree/master/rasa_nlu/classifiers or https://github.com/RasaHQ/rasa_nlu/tree/master/rasa_nlu/extractors dependent on whether you want intent classification or entity extraction

91ns commented 6 years ago

We would also like to contribute to the Arabic language if anyone is up for it message me please.

andreasstuber commented 6 years ago

Am also interested in Arabic NLU . Contact me if interested in a a joint effort. @akelad - in https://nlu.rasa.com/languages.html you mention that fastText vectors can be added to Spacy. Would this enable Spacy to do Arabic NER?

akelad commented 6 years ago

so the problem is you need a POS tagger at the moment to extract entities, which spacy does not have for arabic

andreasstuber commented 6 years ago

@akelad there were half a dozen independent posts on Arabic NER on https://gitter.im/RasaHQ/rasa_nlu , and also in this repository. Might be good to coordinate efforts to find a solution to an arabic POS tagger. I've found this: http://www.arabicnlp.pro/ , https://goo.gl/4Rx1bD, https://goo.gl/RkHu4r

akelad commented 6 years ago

yeah sure, if you or someone from the community is willing to put the effort into it, that'd be great :) we've had contributions like this for Chinese for example we're actually looking into language agnostic NER as well, since this is obviously a problem for lots of languages.

akelad commented 6 years ago

actually, here is the PR related to language agnostic NER https://github.com/RasaHQ/rasa_nlu/pull/1095 once it's merged it'd be great if you could try it out and let us know how it works for you

andreasstuber commented 6 years ago

Language agnostic NER #1095 sounds very exciting! We're actually focusing on the African continent, so Arabic would just be the main and obvious missing language. But there is Lingala, Swahili, Yoruba, Ga, Berber, Ebo that our clients would also love to have. Is there anyway how we can contribute/ support or would it be primarily by us testing? Is there initial design doc avail to get up to speed more rapidly?

akelad commented 6 years ago

Well once this PR is merged, in theory both intent classification and NER should work for all these languages. So you'd just have to try out the tensorflow_embedding pipeline on your datasets and see how well it performs. The docs on the parameters for the pipeline can be found here: https://nlu.rasa.com/pipeline.html#intent-classifier-tensorflow-embedding https://nlu.rasa.com/pipeline.html#intent-featurizer-count-vectors And of course the docs will be updated as soon as the NER PR is merged as well.

It'd be great if you could keep us up to date on how well this pipeline works for those languages, since practice is always different to theory :D

andreasstuber commented 6 years ago

@akelad , we just tested the latest PR with Arabic, bot for intent and entity recognition. Seems to work just fine with a small test (not yet representative). We'll do some more exhaustive testing with some other languages (Urdu, Jeriza, Yoruba) and larger samples. This is really exciting!

reza-ebrahimi commented 6 years ago

@andreasstuber Could you please share your data and config file for Arabic test?

91ns commented 6 years ago

@reza-ebrahimi

If you pull the latest version of Rasa NLU and use this pipeline it will work for Arabic Intent Classification + NER. I tried this configuration and it's very good.

language: "ar"

pipeline:

sunil3590 commented 6 years ago

@91ns does it now work for Arabic because of https://github.com/RasaHQ/rasa_nlu/pull/1095 ?

dmytropanontko commented 6 years ago

Hi @91ns) How you download ar module?

akelad commented 6 years ago

@sunil3590 yes it does! @dmytropanontko you don't, this is just a placeholder so you know what language your bot is in -- the tensorflow embedding pipeline is language agnostic: http://rasa.com/docs/nlu/pipeline/#tensorflow-embedding

Also I'm going to close this issue now since it's been inactive for a while and it's more of a discussion anyways. Feel free to discuss more on our forum

zahreva commented 5 years ago

Hi, can anyone help me setting up rasa NLU with the Arabic language? Got lost a bit in this thread :/

akelad commented 5 years ago

have you tried the tensorflow embedding pipeline?

ahlam1234 commented 4 years ago

I tried to classify intent in Arabic language and worked fine ,but now I want to extract entities in Arabic what pipeline should i write ?

akelad commented 4 years ago

have you tried the ner_crf?

IsraaMohamedHamid commented 3 years ago

@91ns may I ask what libraries you imported for it to work?