dgtlmoon / changedetection.io

The best and simplest free open source web page change detection, website watcher, restock monitor and notification service. Restock Monitor, change detection. Designed for simplicity - Simply monitor which websites had a text change for free. Free Open source web page change detection, Website defacement monitoring, Price change notification
https://changedetection.io
Apache License 2.0
19.17k stars 1.04k forks source link

[feature] Option to use undetected Chrome driver #1700

Closed hexclann closed 1 year ago

hexclann commented 1 year ago

Version and OS Ubuntu 20.04 with Docker

Is your feature request related to a problem? Please describe. Most of the websites block if they see playright, selenium browser. And I get blocked instantly if the website uses Cloudflare. Changing the user agent will not work since these tools detect specific features that are found only on these automated testing softwares.

I tried using proxy but still I get blocked by Cloudflare.

I was able to bypass Cloudflare in another project using undetected chromedriver.

Describe the solution you'd like An option to use undetected chromedriver as an alternative to playright and selenium.

Describe the use-case and give concrete real-world examples Undetected chromedriver is made to solve this issue by evading bot detectors and firewalls.

https://github.com/ultrafunkamsterdam/undetected-chromedriver

Additional context NA

dgtlmoon commented 1 year ago

https://github.com/ultrafunkamsterdam/undetected-chromedriver hmm interesting

jonoff commented 1 year ago

Not sure if it was updating my version or what, but I'm starting to see a lot of 403 (Access denied) received Try adding external proxies/locations.

Any chance of this feature to use detected chromedriver being worked on?

dgtlmoon commented 1 year ago

Any chance of this feature to use detected chromedriver being worked on?

unfortunately no not for now, I havent seen anyone give any evidence that it works any better than using good proxies

ekasprzak commented 1 year ago

Hi, I agree with @jonoff here, I almost stopped using CD as I'm mostly using as a price monitoring tool and right now, captchas are everywhere, so everyday I just disable more and more items.

Is there any good proxy which is not super expensive (just checked again, eg. recommended OxyLabs Web Unlocker, the cheapest option is a $75+VAT/month, which is really a lot for someone who just wants to monitor a few pages) that works with CD?

dgtlmoon commented 1 year ago

@ekasprzak "undetected chromedriver" sounds like a nice idea, but i'm still waiting for anyone to give me proof that it beats captcha's without requiring you to change your IP

hexclann commented 1 year ago

@dgtlmoon I use undetected chromedriver for creating test accounts, changing configuration on webapps built for my clients. And I use Cloudflare on all the sites. Some of the sites uses bot detection software such as Imperva. I was able to get pass all of them without using proxy or changing my IP.

I used undetected chromedriver on a government website to file complaints repeatedly using automation with python.

When I used changedetection to monitor a ticket booking site (ticket for a movie in a specific theatre) I was immediately blocked by Cloudflare. Then I wrote a simple python script with undetected chromedriver as a headless chromedriver and I was able to get through the Cloudflare checks.

As @ekasprzak said, most of the good proxy services are $50+ with very little data usage limits.

Edit: undetected chromedriver alone can beat most of the captchas on a good IP such as regular residential IPs (including CGNAT IPs) without requiring a expensive proxy configuration

IgitBuh commented 1 year ago

Any chance of this feature to use detected chromedriver being worked on?

unfortunately no not for now, I havent seen anyone give any evidence that it works any better than using good proxies

I'm sorry, but does it really make sense to close this valid request without any kind of solution? I don't know if undetected_chromedriver is THE solution, but according to what I have read, it might be a valid option.

In the current state, most of the web shops where I want to track the price, prevent ChangeDetection.io completely. Even without hammering, e.g. with a timer of 15min.

I've been using it from time to time for about 2 years now. It has never been as bad as now. There needs to be a solution or it will become completely useless. Instead of closing and ignoring the issue, it should be treated with a high priority.