cloudveiltech / Filter-Windows

HTTP/S Content Filter for Windows 7 and newer
Mozilla Public License 2.0
9 stars 13 forks source link

Add language detection support for potential language based filtering #84

Open TechnikEmpire opened 6 years ago

TechnikEmpire commented 6 years ago

Apache OpenNLP added support for language detection. We've discussed briefly the idea of filtering based on language as a potential temporary solution to filtering unknown URLS where content (text) classification can't be done because of a lack of data for the specific language.

https://opennlp.apache.org/news/release-181.html

http://people.apache.org/~colen/models/langdetect-183/rc1/

That idea isn't the best one, but even if we don't go with it, it's a quick and easy route to being able to dynamically determine context ("I've recognized that this page is German, load the German text classification models", for example).

TechnikEmpire commented 6 years ago

First step here would be to update my IKVM based version on Nuget to latest, then we've got support for this that we may or may not build on in the future.