axa-group / nlp.js

An NLP library for building bots, with entity extraction, sentiment analysis, automatic language identify, and so more
MIT License
6.3k stars 622 forks source link

Language guess mistakes english for catalan #1364

Open jcalve opened 11 months ago

jcalve commented 11 months ago

Describe the bug The Language.guess() function mistakes a short english sentence for catalan

To Reproduce 1 - Run this script:

import { Language } from "@nlpjs/language"

const lang = new Language();
const text = 'What is your name?'
console.log(text, lang.guess(text, ['es', 'en', 'ca']))

Output

What is your name? [
  { alpha3: 'cat', alpha2: 'ca', language: 'Catalan', score: 1 },
  {
    alpha3: 'eng',
    alpha2: 'en',
    language: 'English',
    score: 0.9702093397745571
  },
  {
    alpha3: 'spa',
    alpha2: 'es',
    language: 'Spanish',
    score: 0.7093397745571659
  }
]

Desktop (please complete the following information):

ackava commented 2 months ago

Few more examples,

import { Language } from "@nlpjs/language"

const lang = new Language();
const text = 'What is your name?'
console.log(text, lang.guess(text).filter((x, i) => i< 3).map((x) => [x.language, x.score]));
What is your name? [
  [ 'Catalan', 0.6079295154185023 ],
  [ 'English', 0.5898188937836515 ],
  [ 'Tagalog', 0.5824767498776309 ]
What is your name? My name is Akash. [
  [ 'Tagalog', 0.8929468157954805 ],
  [ 'Igbo', 0.8762839534352888 ],
  [ 'English', 0.8621319333485505 ]
]
Language guess mistakes english for catalan. [
  [ 'Catalan', 1 ],
  [ 'Javanese', 0.9550908467603703 ],
  [ 'English', 0.9403496743229345 ]
]