curiosity-ai / catalyst

๐Ÿš€ Catalyst is a C# Natural Language Processing library built for speed. Inspired by spaCy's design, it brings pre-trained models, out-of-the box support for training word and document embeddings, and flexible entity recognition models.
MIT License
699 stars 71 forks source link

LanguageDetector.FromStoreAsync(): Why can't we pass an Array of Language? #87

Open schittli opened 1 year ago

schittli commented 1 year ago

Good evening Thank you very much for sharing your great work!

When testing LanguageDetector, it often happens that Catalyst recognizes languages of which it is already clear in advance that these languages are not even an option.

Therefore, it would be very useful if we could provide LanguageDetector with List<Language> to tell which languages are possible at all.

Is your feature request related to a problem? Please describe. For example, if one only uses English, German, and French texts, LanguageDetector often detects Norwegian.

Describe the solution you'd like I am pretty sure that if we can help LanguageDetector and say that the text can only be in one of three languages, it will then hit the right language much better ๐Ÿ˜ƒ.

Describe alternatives you've considered I tried downloading just the NuGet language models for English, German and French, but LanguageDetector nevertheless detected Norwegian. Very strange. it looks like LanguageDetector is automatically downloading Language Models (great!), but there is no word about this feature in the code comment ๐Ÿ˜ข

Thanks a lot, kind regards, Thomas