Closed IvanSaldikov closed 1 month ago
Hi @IvanSaldikov,
Thank you for using Scrapoxy!
I understand your concern, but Scrapoxy does need to connect to the fingerprint URL to establish the link and check if the proxy is working properly. You can find more details about this here: https://scrapoxy.io/intro/qna#how-much-bandwith-does-the-fingerprint-use.
To help reduce bandwidth, you can set the proxy ping interval to the maximum (30 seconds). It might also help to use the warm/hot status mechanism, which is often a good practice with Scrapoxy.
Best regards, Fabien
I noticed the same thing, I had 10.000+ requests in one day. The default setting for proxy timeout was 5sec. I am also have the auto rotate proxy to 2m-5min.
The 30seconds interval again i think is very heavy with a lot of requests. Many requests and bandwidth will be consumed.
But for @IvanSaldikov as I am also new in the research of this project, the solution I think is as mention here "Scrapoxy requires a minimum number of proxies to maintain a stable connection; otherwise, all requests will fail. This remaining connection is essential for detecting whether Scrapoxy is receiving any activity. If traffic is detected and Auto Scale Up is enabled, Scrapoxy will change the project's status from CALM to HOT.
If you prefer not to keep at least one proxy active, please disable Auto Scale Up and use the API to manually change the project's status."
My idea on that is that if you put auto scale off, set to Proxy status instead of CALM to off then and through API you can set : Proxy Status CALM or HOT wait until connector is up (get with the api/scraper/project/connectors the proxies array will be above zero length) and then make the requests to scrapoxy. So this way no fingerprint requests.
@pbrns Nicely put, that’s exactly the point.
Scrapoxy is indeed a proxy manager designed specifically for web scraping. In most web scraping cases, maintaining a connection (or VPS) without usage isn’t necessary, unless you’re planning to resell the connection like proxy vendors do, which isn't what Scrapoxy is built for.
While it's possible to extend the timeout to 60 seconds or more, doing so would compromise the circuit breaker functionality, which is something I’d prefer to avoid.
I highly recommend to use of the API for fingerprint requests optimisation.
Current Behavior
When Scrapoxy reaching FINGERPRINT_URL it consumes requests quota of Zyte API which is not effective
Expected Behavior
When Scrapoxy reaching FINGERPRINT_URL it SHOULD NOT consume requests quota of Zyte API. Maybe use direct connection istead to not consume quotas?
Steps to Reproduce
Stats
->Requests
tab, chooseToday
and see the requests to scrapoxy.io (see the screenshot).Failure Logs
No response
Scrapoxy Version
4.16.0
Custom Version
Deployment
Operating System
Storage
Additional Information
No response