fabiospampinato / lande

A tiny neural network for natural language detection.
MIT License
62 stars 6 forks source link

It detects other language but it should be English #3

Open patrickReiis opened 1 week ago

patrickReiis commented 1 week ago

Hi, I have no experience in this kind of stuff, so I'll just ask about it to get more information.

The following text: It may die when I die, and that’s okay. It’s my earnings. Gets detected as African:

[ 
  [ "afr", 0.9883103966712952 ],
  [ "eng", 0.011473776772618294 ] 
]

Do you know why this happens? Shorter texts like Good morning my friends gets detected as English.

So I presume it's not the amount of words but the way they are written? Thanks.

fabiospampinato commented 1 week ago

I don't know exactly what function the model learned, in general the bigger the piece of text that you give it (assuming it's not just a single word repeated or something like that) the more accurate the detection should be.

The model only sees n-grams of the text, so I guess there are some similarities in there between english and afrikaans, maybe I'm pre-processing the sentences I'm trying the model on incorrectly or something 🤔

I've been meaning to publish a better v2 of this, but I lack the time at the moment.

patrickReiis commented 1 week ago

Got it, thanks for replying!

We use your library in our project for language detection, after we have the detected language we use another service to translate it, so your invention is very good!