Language detection - Githubissues

eonum / medtextcollector

Scripts for the collection of online medical texts and definitions

MIT License

1 stars 0 forks source link

Language detection #5

Closed asittampalam closed 7 years ago

asittampalam commented 7 years ago

Some english medical documents are currently also detected as positive (probably because of the shared latin vocabulary). So we either need a more complex model which is able of differentiating between "english medical" and "german medical" - or we need a preclassification stage for language discrimination.

asittampalam commented 7 years ago

bb829b7fe47f8975eb5b53f5254c92c351ae164b