Source of language datasets

feedbackmine / language_detector

ruby language detection library using n-gram

http://HelpdeskOnTwitter.com

121 stars 68 forks source link

Source of language datasets #6

Open DonaldTsang opened 4 years ago

DonaldTsang commented 4 years ago

Where is the source text dataset for the Ngrams of those 96 languages? Would like to see if it is different from wooorm/franc#78 usage of UDHR, and if it is more accurate than them.