cristinae / WikiTailor

Your à-la-carte in-domain corpora extraction tool from Wikipedia
1 stars 0 forks source link

Gujarati language #28

Open cristinae opened 5 years ago

cristinae commented 5 years ago

Gujarati for a basic WP pre-processing has bee added. A stemmer must be included though and this one can be adapted: https://sourceforge.net/projects/stemmergujarati/

The IR part will be more difficult, Gujarati is not in Lucene but it's within Azure. Look at it!