e2guardian / e2guardian

E2guardian is a web content filter that can work in proxy, transparent or icap server modes
http://www.e2guardian.org
GNU General Public License v2.0
480 stars 139 forks source link

Decode Russian symbols. #204

Open Kenny690 opened 7 years ago

Kenny690 commented 7 years ago

Hej. Even Dansguardian has this problem, that I can't see searching requests in access.log if they were in Russian. I see something like this: http://search.skydns.ru/search/?r=1&query=%D0%AD%D0%BB%D0%B5%D0%BA%D1%82%D1%80%D0%BE%D0%BD%D0%BD%D0%B0%D1%8F+%D1%82%D0%B5%D1%82%D1%80%D0%B0%D0%B4%D1%8C+%D0%BF%D0%BE+%D1%80%D1%83%D1%81%D1%81%D0%BA%D0%BE%D0%BC%D1%83+%D1%8F%D0%B7%D1%8B%D0%BA%D1%83+%E2%84%962+1+%D0%BA%D0%BB%D0%B0%D1%81%D1%81+21+%D0%B2%D0%B5%D0%BA

Except of this: http://search.skydns.ru/search/?r=1&query=Электронная+тетрадь+по+русскому+языку+№2+1+класс+21+век

Is there a way to decode URLs for the log file?

philipianpearce commented 7 years ago

There are a number of issues related to character encoding that need addressing including regexp handling, and logging. Will look at this for the next version (v4.2/3).

VasiliyF commented 7 years ago

Also, search term blocking not working when URL contain UTF-8 encoded string (%xx). But bannedregexpurllist working well with UFT-8, for example yandex.ru\/search\/.*text=хер

Kenny690 commented 6 years ago

I have a little suggestion. Don't know if it's any good thou, cause I'm a little dum-dum. :D What about just adding ch_isiphost.comp(",[a-z|A-Z|а-я|А-Я].");

instead of ch_isiphost.comp(",[a-z|A-Z].");

In https://github.com/e2guardian/e2guardian/blob/v5.1/src/NaughtyFilter.cpp ?

sv-bio commented 3 years ago

I found one solution that works for Russian char sets: see issue #591