Source of language datasets

FGRibreau / node-language-detect

🇫🇷 NodeJS language detection library using n-gram

http://blog.fgribreau.com/2011/07/week-end-project-nodejs-language.html

MIT License

397 stars 45 forks source link

Source of language datasets #37

Closed DonaldTsang closed 4 years ago

DonaldTsang commented 4 years ago

Where is the source text dataset for the Ngrams of those 52 languages? Would like to see if it is different from https://github.com/wooorm/franc/issues/78 usage of UDHR, and if it is more accurate than them.

mahnunchik commented 4 years ago

Ping @FGRibreau

FGRibreau commented 4 years ago

As said in the README, the whole database came from https://pear.php.net/package/Text_LanguageDetect :)