Closed raphCode closed 2 years ago
Reqwest seems to support this. https://docs.rs/reqwest/latest/reqwest/struct.Proxy.html
We could read the environment variable before creating the Downloader
What I find strange, the environment variable is already read and processed somehow:
strings
on the binary shows that the strings http_proxy
and HTTP_PROXY
are containedBut, the downloaded page does not show the proxy ip for the ifconfig.me website.
My knowledge on proxy is quite limited but I agree with you, something is not right
This works for me (with some warning and retry for the proxy connection):
https_proxy=147.135.134.57:9300 suckit https://ifconfig.me
https_proxy=147.135.134.57:9300 suckit http://ifconfig.me
The ip I get is not my real ip
Note that I'm using https_proxy and not http_proxy. If I used your http_proxy with a random https_proxy this is not working. I think that for some reason we are doing https request even when specifying http.
Nice catch, I can confirm it works with https_proxy
and http_proxy
.
It seems suckit makes https content retrieval and additional http requests for something else.
If I had to guess, there is some code that resolves URLs, which is responsible for the http requests. (I remember some unit test which try to resolve an invalid lwn.net URL and looks for a redirect.)
My public server IP got blocked from scraping a particular website, so I can tell it needs both kinds of proxies to circumvent the block.
For future readers:
For multithreading downloading via proxies to work, the constants MAX_EMPTY_RECEIVES
and SLEEP_MILLIS
may need to be adjusted upwards, otherwise all worker threads exit prematurely: They receive no work in the time interval because of the increased proxy latency.
My public server IP got blocked from scraping a particular website
Typical SuckIT
Should we close this?
As far as I am concerned, yes. Except you want to keep it to open for the multiple proxy feature. This was just an idea, nothing where I would contribute personally.
I tried using a proxy to download via another IP, but couldn't get it to work via the
http_proxy
orHTTP_PROXY
environment variables:First, check "normal" public IP, then set a proxy, I tried some from this list until one worked: https://freeproxylists.net/ Check if the proxy works with the curl command (should return proxy IP). Lastly, run suckit with the proxy and observe that the IP in the downloaded webpage is still the "normal" IP without the proxy.
Still, something is done with the proxy, since an invalid IP leads to an timeout or connection failure, and the latency is increased compared to a non-proxy run.
Beside this bug, a feature idea could be to offer multiple proxies to suckit, and the requests are split across the different proxies. This can further speed up downloading since less traffic is issued from a single IP.