amate / Proxydomo

ローカルで動作するプロクシフィルタリングソフトです
http://www31.atwiki.jp/lafe/pages/37.html
GNU General Public License v2.0
74 stars 12 forks source link

Content-Type application/xhtml+xml not filtered #66

Open WRFan opened 4 years ago

WRFan commented 4 years ago

If you remove the user-agent from the request headers or use some user-agent that google doesn't recognize, it sends some weird mobile page to the browser, which Proxydomo fails to filter. If I enable the Web Filter Debug in the Log window and load the page, Proxydomo just displays some binary (gzipped ?) output. Could you please look into this issue?

https://github.com/amate/Proxydomo/issues/new

Request:

Request sent to website GET /search?hl=en&nfpr=1&prmd=u&q=a HTTP/1.1 Accept: text/html, application/xhtml+xml, image/jxr, / Accept-Language: en-GB,en-US,en,de-DE,ru-RU Accept-Encoding: gzip, deflate User-Agent: AdsBot-Google Host: www.google.de DNT: 1 Connection: Keep-Alive

Response:

Response sent to browser HTTP/1.1 200 OK Content-Type: application/xhtml+xml; charset=ISO-8859-1 Date: Sat, 30 Nov 2019 23:41:58 GMT Content-Encoding: gzip Transfer-Encoding: chunked Access-Control-Allow-Origin: *

nhantrn commented 4 years ago

What are you trying to do after removing the UA? Leaving the ua blank probably tripped some Google rule and they switched to their minimal mobile page.

It still got filtered fine on my end with this:

[Patterns] Name = "google test" Version = "" Author = "" Comment = "" Active = TRUE Multi = FALSE URL = "www.google.com/search" Bounds = "" Limit = 2048 Match = "\<header\<\/header>" Replace = "\<h1>TEST\<\/h1>"

WRFan commented 4 years ago

The filter you tested with is a web page filter, I'm talking about the user agent request header - "outgoing header" in proxydomo:

[HTTP headers] Key = "User-Agent: User-Agent Debug (Out)" In = FALSE Out = TRUE Version = "" Author = "" Comment = "" Active = TRUE Multi = FALSE URL = "" Bounds = "" Limit = 256 Match = "$URL(http(s|)://(.|)google./search\?)" Replace = "\0"

It's not about google, the question is if it's the only page on the internet that causes this problem. If it is I can live with it, but there may be more pages like this.

Btw, it's interesting which user-agent strings google expects to send the standard non-mobile page. I tested it a little and found out the google servers expect one of the following user-agents:

(MSIE 6; trident/6)

Trident/7 "7" matters

Firefox/7 "gecko/" before the string is ok

(windows) applewebkit/ Edge (windows) applewebkit/ Chrome/5 Safari

Applewebkit/537 Version/09 Safari/ Applewebkit/600

------------------------------------------------------------------------------- google Images (MSIE 1 Applewebkit/1

Anything else, or if the user-agent is not there at all (as in the filter above), and google sends the mobile page that's ignored by Proxydomo