Open olidietzel opened 6 years ago
This minimal patch in app.py was good enough for a poc. :)
if not parsed_url.hostname:
url = request.url
#return response.text('Bad Request', status=400)
Haven't used squid with chrome-prerender before, I am not sure what's wrong. Could you elaborate?
I wanted to see the render quality of the chrome engine with my own eyes in a browser and do a test spider run on an existing angular web spa with an old school tool, httrack, in order to see if the whole angular app is crawlable.
For both i needed a regular proxy api interface, so i have put a squid proxy in front of prerender. The Squid is configured to eat all static file requests directly and sends the rest of the requests to his chrome-prerender parent proxy.
Squid as a proxy client sends different than expected requests to his "parent proxy", in this case prerender, so i had to make prerender understand these.
[2017-12-28 17:34:06 +0100] - (sanic.access)[INFO][1:2]: GET http://www.nytimes.com/ 400 11
Worked by replacing
if not parsed_url.hostname: return response.text('Bad Request', status=400)
with
if not parsed_url.hostname: url = request.url
Would you like to send a PR to fix it?
First: Thx a lot for this great piece of software!
Second: I am just a dino admin, would be my first PR here. And what i did was just a crude hack job, should be done the right way by some coder more competent than me in order to minimize potential side effects! :)
If someone wants to do this and needs to configure a squid proxy for testing, this is the squid.conf i used (relevant parts are the cache_peer directive, squid runs locally on the same vm as prerender, and the "direct acls" named "static" and "direct"):
[root@prerender ~]# cat /etc/squid/squid.conf
###
### Recommended minimum configuration:
###
cache_peer 127.0.0.1 parent 8000 0 no-query no-digest
### Example rule allowing access from your local networks.
### Adapt to list your (internal) IP networks from where browsing
### should be allowed
acl localnet src 10.0.0.0/8 # RFC1918 possible internal network
acl localnet src 172.16.0.0/12 # RFC1918 possible internal network
acl localnet src 192.168.0.0/16 # RFC1918 possible internal network
acl localnet src fc00::/7 # RFC 4193 local private network range
acl localnet src fe80::/10 # RFC 4291 link-local (directly plugged) machines
acl SSL_ports port 443
acl Safe_ports port 80 # http
acl Safe_ports port 21 # ftp
acl Safe_ports port 443 # https
acl Safe_ports port 70 # gopher
acl Safe_ports port 210 # wais
acl Safe_ports port 1025-65535 # unregistered ports
acl Safe_ports port 280 # http-mgmt
acl Safe_ports port 488 # gss-http
acl Safe_ports port 591 # filemaker
acl Safe_ports port 777 # multiling http
acl CONNECT method CONNECT
acl static urlpath_regex \.(html|htm|css|ico|js|gif|jpg|jpeg|png|xml|json|woff|JPG|JPEG|woff2|ttf|eot|svg)(\?.*)?$
acl direct dstdomain fonts.googleapis.com
###
### Recommended minimum Access Permission configuration:
###
### Deny requests to certain unsafe ports
http_access deny !Safe_ports
### Deny CONNECT to other than secure SSL ports
http_access deny CONNECT !SSL_ports
### Only allow cachemgr access from localhost
http_access allow localhost manager
http_access deny manager
### We strongly recommend the following be uncommented to protect innocent
### web applications running on the proxy server who think the only
### one who can access services on "localhost" is a local user
### http_access deny to_localhost
###
### INSERT YOUR OWN RULE(S) HERE TO ALLOW ACCESS FROM YOUR CLIENTS
###
http_access allow localnet
http_access allow localhost
### And finally deny all other access to this proxy
http_access deny all
### Squid normally listens to port 3128
http_port 3128
### Uncomment and adjust the following to add a disk cache directory.
### cache_dir ufs /var/spool/squid 100 16 256
### Leave coredumps in the first cache dir
coredump_dir /var/spool/squid
###
### Add any of your own refresh_pattern entries above these.
###
refresh_pattern ^ftp: 1440 20% 10080
refresh_pattern ^gopher: 1440 0% 1440
refresh_pattern -i (/cgi-bin/|\?) 0 0% 0
refresh_pattern . 0 20% 4320
always_direct allow static
always_direct allow direct
I would like to use chrome prerender in a proxy sandwich configuration (cache as much as possible), but squid as a client uses different GET requests. Ideas what to configure where, anyone?
Curling works fine: [2017-12-28 17:33:27 +0100] - (sanic.access)[INFO][1:2]: GET http://127.0.0.1:3000/http://www.nytimes.com/ 200 446977 2017-12-28 17:33:27,944 INFO sanic.access.log_response:325
Squid fails: [2017-12-28 17:34:06 +0100] - (sanic.access)[INFO][1:2]: GET http://www.nytimes.com/ 400 11 2017-12-28 17:34:06,510 INFO sanic.access.log_response:325 [2017-12-28 17:34:11 +0100] [23436] [INFO] KeepAlive Timeout. Closing connection. 2017-12-28 17:34:11,510 INFO root.keep_alive_timeout_callback:193 KeepAlive Timeout. Closing connection.