Open Monkleys opened 5 years ago
Scraproxy accepts requests as HTTP but the HTTPS URL must be in the Location header, source: http://docs.scrapoxy.io/en/master/advanced/understand/index.html#can-scrapoxy-relay-https-requests
go-colly doesn't seem to support this, if the URL is HTTPS and the only proxy available is HTTP, go-colly seems to just skip over it and not use a proxy. I've tested it, and it works perfectly when the website is just HTTP.
Would there be any way to get around this?
Have you tried SOCKS4/5 to solve this problem?
If you are interested, Scrapoxy 4 is out:
Scrapoxy is a open source proxy aggregator, allowing you to manage all proxies in one place π―, rather than spreading it across multiple scrapers πΈοΈ.
Smartly designed for efficient traffic routing π, Scrapoxy minimizes #bans and boosts success rates π.
The tech stack is built on the latest NodeJS, Typescript, utilizing the NestJS and Angular frameworks.
Here are the key features:
Checkout https://scrapoxy.io/ !
Scraproxy accepts requests as HTTP but the HTTPS URL must be in the Location header, source: http://docs.scrapoxy.io/en/master/advanced/understand/index.html#can-scrapoxy-relay-https-requests
go-colly doesn't seem to support this, if the URL is HTTPS and the only proxy available is HTTP, go-colly seems to just skip over it and not use a proxy. I've tested it, and it works perfectly when the website is just HTTP.
Would there be any way to get around this?