Open KevinDanikowski opened 3 years ago
There is no support for Japanese, however, it's a popular enough language that I think it should be supported.
Current behavior is to guess the language is English due to Japanese characters not being recognized since it's a unique character set.
Sample: "シャーロック・ホームズ (Sherlock Holmes) は、19世紀後半に活躍したイギリスの小説家・アーサー・コナン・ドイルの創作した[1]、シャーロック・ホームズシリーズの主人公である、架空の探偵"
Result:
[ [ 'english', 0.030795454545454626 ], [ 'somali', 0.026553030303030245 ], [ 'estonian', 0.021590909090909105 ], [ 'hungarian', 0.021098484848484755 ], [ 'danish', 0.019962121212121264 ], [ 'albanian', 0.019053030303030183 ], [ 'hawaiian', 0.015946969696969737 ], [ 'french', 0.015643939393939377 ], [ 'latin', 0.015606060606060623 ], [ 'german', 0.015454545454545388 ], [ 'hausa', 0.01435606060606065 ], [ 'swedish', 0.012575757575757462 ], [ 'welsh', 0.011325757575757489 ], [ 'portuguese', 0.010909090909090868 ], [ 'czech', 0.010833333333333361 ], [ 'spanish', 0.010492424242424137 ], [ 'latvian', 0.01041666666666663 ], [ 'swahili', 0.010227272727272751 ], [ 'norwegian', 0.009356060606060645 ], [ 'pidgin', 0.00920454545454541 ], [ 'vietnamese', 0.007348484848484826 ], [ 'dutch', 0.006212121212121224 ], [ 'icelandic', 0.005113636363636487 ], [ 'indonesian', 0.003901515151515156 ], [ 'lithuanian', 0.0012499999999999734 ] ]
I will happily accept a PR for this along with some tests :+1:
Any plan to support this?
Not at the moment, but I accept PRs :)
There is no support for Japanese, however, it's a popular enough language that I think it should be supported.
Current behavior is to guess the language is English due to Japanese characters not being recognized since it's a unique character set.
Sample: "シャーロック・ホームズ (Sherlock Holmes) は、19世紀後半に活躍したイギリスの小説家・アーサー・コナン・ドイルの創作した[1]、シャーロック・ホームズシリーズの主人公である、架空の探偵"
Result: