FGRibreau / node-language-detect

🇫🇷 NodeJS language detection library using n-gram
http://blog.fgribreau.com/2011/07/week-end-project-nodejs-language.html
MIT License
399 stars 45 forks source link

Add support for Japanese #39

Open KevinDanikowski opened 3 years ago

KevinDanikowski commented 3 years ago

There is no support for Japanese, however, it's a popular enough language that I think it should be supported.

Current behavior is to guess the language is English due to Japanese characters not being recognized since it's a unique character set.

Sample: "シャーロック・ホームズ (Sherlock Holmes) は、19世紀後半に活躍したイギリスの小説家・アーサー・コナン・ドイルの創作した[1]、シャーロック・ホームズシリーズの主人公である、架空の探偵"

Result:

[
  [ 'english', 0.030795454545454626 ],
  [ 'somali', 0.026553030303030245 ],
  [ 'estonian', 0.021590909090909105 ],
  [ 'hungarian', 0.021098484848484755 ],
  [ 'danish', 0.019962121212121264 ],
  [ 'albanian', 0.019053030303030183 ],
  [ 'hawaiian', 0.015946969696969737 ],
  [ 'french', 0.015643939393939377 ],
  [ 'latin', 0.015606060606060623 ],
  [ 'german', 0.015454545454545388 ],
  [ 'hausa', 0.01435606060606065 ],
  [ 'swedish', 0.012575757575757462 ],
  [ 'welsh', 0.011325757575757489 ],
  [ 'portuguese', 0.010909090909090868 ],
  [ 'czech', 0.010833333333333361 ],
  [ 'spanish', 0.010492424242424137 ],
  [ 'latvian', 0.01041666666666663 ],
  [ 'swahili', 0.010227272727272751 ],
  [ 'norwegian', 0.009356060606060645 ],
  [ 'pidgin', 0.00920454545454541 ],
  [ 'vietnamese', 0.007348484848484826 ],
  [ 'dutch', 0.006212121212121224 ],
  [ 'icelandic', 0.005113636363636487 ],
  [ 'indonesian', 0.003901515151515156 ],
  [ 'lithuanian', 0.0012499999999999734 ]
]
FGRibreau commented 3 years ago

I will happily accept a PR for this along with some tests :+1:

yangsa666 commented 1 year ago

Any plan to support this?

FGRibreau commented 1 year ago

Not at the moment, but I accept PRs :)