dachev / node-cld

Language detection for Javascript (Node). Based on the CLD2 (Compact Language Detector) library from Google.
Apache License 2.0
316 stars 55 forks source link

Multiple Language support #45

Closed ccorcos closed 5 years ago

ccorcos commented 6 years ago

I noticed that the array of languages only ever returns one item. It would be cool if it would detect multiple languages and return all of them when typing mixed language content.

elmigranto commented 6 years ago

I think it's a case already, just tried it out on cld@2.4.8, and got multiple langs:

{ reliable: true,
  textBytes: 484,
  languages: 
   [ { name: 'RUSSIAN', code: 'ru', percent: 62, score: 596 },
     { name: 'ENGLISH', code: 'en', percent: 37, score: 1386 } ],
  chunks: 
   [ { name: 'ENGLISH', code: 'en', offset: 0, bytes: 182 },
     { name: 'RUSSIAN', code: 'ru', offset: 182, bytes: 309 } ] }
ccorcos commented 6 years ago

Oh you're right. I suppose it doesn't work as well with spanish and english...

ccorcos commented 6 years ago

The following returns only english:

Hola mi amigo, que pasa? Porque no hablas español? hello world. Does this work properly? I don't think so.

hello world. Does this work properly? I don't think so. Hola mi amigo, que pasa? Porque no hablas español?

{name: "ENGLISH", code: "en", percent: 99, score: 702}

And this returns Spanish:

Hola mi amigo, que pasa? Porque no hablas español?

{name: "SPANISH", code: "es", percent: 98, score: 742}
dachev commented 5 years ago

Actual results are generated by the underlying Google CDL library which is outside the scope of this project.