Closed Florian95 closed 8 years ago
Hello,
Thanks for your feedback :)
That is a normal behavior.
For HTTP/HTTPS, there are 2 methods:
Method REQUEST for HTTP
Method CONNECT for HTTPS
=> The proxy only views a TCP redirect. Content cannot be understand by the proxy.
This a huge problem because Scrapoxy changes HTTP headers on-the-fly (useragent).
To bypass the problem Scrapoxy accepts only HTTP proxy. For HTTPS request, you must URL with HTTPS with REQUEST mode. See Tutorial
Why cURL doesn't work ? cURL cannot works with HTTPS on REQUEST mode. If cURL detects an HTTPS URL, it will ask a CONNECT mode. More information here and here.
The Scrapy framework accepts HTTPS on REQUEST mode.
You must specify 'noconnect' in you proxy URL:
PROXY = 'http://127.0.0.1:8888/?noconnect'
Hope it will help, Fabien.
Hello,
Excellent work, but that does not seem to work with HTTP to HTTPS redirection, for example: http://github.com/fabienvauchelles/scrapoxy
HTTP :
HTTPS :
Regards,