Open gabriele-tomassetti opened 4 years ago
@gabriele-tomassetti @ftomassetti I'm the maintainer of an open-source C# NLP library that has two models for language detection: https://github.com/curiosity-ai/catalyst/ If you want I can either port the code from there, or add as a dependency to cover this need.
Thanks for your offer to help on this issue, too. Honestly, I was mostly looking at this issue as an excuse to work on a NLP library, but if your library can do it better and sooner, I see no reason not to use it.
I think we only have two requirements:
We could add it as a callback you need to provide, and just add an example on the Wiki of how to use it with Catalyst for example
That's a really smart idea. I will work on it.
Fasttext will allow to implement effective language identification, with little space and resources required. This can be useful for a lot of content that has no language and also for content that contains multiple languages.