asmehra95 / wiseowl

This is a Fact based Question Answering System using Apache Solr as backend search engine, Wikipedia dumps as information source, Apache velocity , Html, Css for Web interface Design. The project also uses Linux bash script to perform its various functions like start,stop,training,indexing.
MIT License
25 stars 9 forks source link

Arabic support #3

Open mzeidhassan opened 7 years ago

mzeidhassan commented 7 years ago

First, let me thank you for such great project. It really looks promising.

I see that you use StanfordCoreNLP which apparently supports Arabic. Does this mean that WiseOwl can handle Arabic Q&A? If yes, can you please let me know what is needed it to make it work for Arabic?

Thanks again!

asmehra95 commented 7 years ago

I am glad that you liked it.

We are currently using Stanford English models only. It is possible to port it to support Arabic Language. But it would require few models to be trained. Including Stanford Models for Arabic you can use one provided by Stanford.. You will have to train a model for answer type classification using Apache openNLP (MaxEnt). I am not sure if solr is able to index arabic text.

mzeidhassan commented 7 years ago

Thank you so much for your reply. Solr can index Arabic without problems. Is there any guide or tutorial on how to train a model using OpenNLP? That would make things easier for me.

Thanks again for sharing your great project with us.

asmehra95 commented 7 years ago

I suggest you start from the documentation of OpenNLP at: https://opennlp.apache.org/documentation/1.7.2/manual/opennlp.html Focus on training part of the code it should be easy to get. Your major task would be to find a dataset of questions with corresponding answer types. We used a very simple version taken from Taming Text. You can find out more about it at chapter 8 of taming text.
If you are not able to find the corpus, you can generate your own but make sure you have enough questions so that it may perform well.

mzeidhassan commented 7 years ago

Thanks a million for your support and for guiding me to the right direction. I appreciate it.

I will try to get the dataset first for Arabic and see how it goes.

Thanks again!

asmehra95 commented 7 years ago

Your welcome! Let me know if you need further help and tell me you are able to obtain a dataset for it.

hohuynh commented 6 years ago

Hi @asmehra95 : When I ran this code this.numDocuments = (int) dfCounter.getCount(" all"); It always return 0, is that correct ?

Thanks.