Apache OpenNLP added support for language detection. We've discussed briefly the idea of filtering based on language as a potential temporary solution to filtering unknown URLS where content (text) classification can't be done because of a lack of data for the specific language.
That idea isn't the best one, but even if we don't go with it, it's a quick and easy route to being able to dynamically determine context ("I've recognized that this page is German, load the German text classification models", for example).
First step here would be to update my IKVM based version on Nuget to latest, then we've got support for this that we may or may not build on in the future.
Apache OpenNLP added support for language detection. We've discussed briefly the idea of filtering based on language as a potential temporary solution to filtering unknown URLS where content (text) classification can't be done because of a lack of data for the specific language.
https://opennlp.apache.org/news/release-181.html
http://people.apache.org/~colen/models/langdetect-183/rc1/
That idea isn't the best one, but even if we don't go with it, it's a quick and easy route to being able to dynamically determine context ("I've recognized that this page is German, load the German text classification models", for example).