Open itsdarrylnorris opened 4 years ago
I have had a similar experience but I don't think its an issue with this project, just the general unreliability of free proxy sources. The main benefit of this project is that you can collect thousands and thousands of proxies that might work and then like you said, check them for whether or not they work and end up with some that work.
If you keep a database that is updated with a cron job, you should have enough to be able to run some scraping projects with depending on their scale. The problem with scraping using proxies in general though is that often times they will get banned from the website you're trying to scrape quickly.
I have had a similar experience but I don't think its an issue with this project, just the general unreliability of free proxy sources.
I agreed, the project is great.
If you keep a database that is updated with a cron job, you should have enough to be able to run some scraping projects with depending on their scale.
My cron job was running every 2 hours, and I was just trying to scrape few pages few times every hour or so and that did not work well for me.
I was considering on making an API for free to get these IP addresses automatically, but if not reliable enough for me, it's not worth the effort. :(
Around 5% of the proxies are fine just test them and verify before using them
new proxy list: http://pzzqz.com
I have this project running a cron job every hour and collecting IP addresses and running test bases on those IP addresses. I mostly get timeouts and unreliable IP addresses.
Even if I health checks them using node-fetch with a timeout and HTTP proxies, they are not reliable enough to be used for scrapping multiple times. I am running these tests using node-fetch, puppeteer, and I am still getting the same results.
Have anyone experienced this? I did not expect free proxies to be 100% reliable, but I think they are about 10% (best case) reliable from my testing using HTTPS requests.
Does anyone find reliable proxies from this project?