Add a language detector first

ArchivesPortalEuropeFoundation / Topic-Detection

Using machine learning approaches for automatic topic detection in a multilingual environment

6 stars 0 forks source link

Add a language detector first #92

Open fedenanni opened 2 years ago

fedenanni commented 2 years ago

First split in sentences, then detect language, then use the tool accordingly.

fedenanni commented 2 years ago

Currently the tool expects the input in one of the selected languages (en, de, etc.). We could add a sentence tokeniser to detect the language, but it would be easier to do this before providing the text to the tool. So, first:

split text in sentences
provide to the tool only sentences in a single language
aggregate the results