dgtlmoon / changedetection.io

The best and simplest free open source web page change detection, website watcher, restock monitor and notification service. Restock Monitor, change detection. Designed for simplicity - Simply monitor which websites had a text change for free. Free Open source web page change detection, Website defacement monitoring, Price change notification
https://changedetection.io
Apache License 2.0
17.3k stars 965 forks source link

[feature] Add configurable delay to WebDriver interface to allow pages to fully load before extracting text #214

Closed cedilla1312 closed 2 years ago

cedilla1312 commented 3 years ago

Some sites take a time to load JavaScript content (e.g. eshop has content filtering JS code). At times I get detection before JS runs and sometimes after, so I'm constantly getting detections, but these are false. Now, do I have to increase this and then finally make my custom rebuilt Docker image? Thank you so much for your help and tool. Ďěkuji.

dgtlmoon commented 3 years ago

I understand what you're saying, but are you asking how to do it, or are you asking if it should be a feature? this I dont understand

cedilla1312 commented 3 years ago

I understand what you're saying, but are you asking how to do it, or are you asking if it should be a feature? this I dont understand

Definitely should be a feature. [FEATURE REQUEST] Also, time scheduling (similar to Sken.io or Calendly) per watch url would be beneficial. Since, at night no changes are being made. Maybe I'm gonna try to change wait constant to an environment variable instead of constant value, since, false positive rate is increasing for me. I knew it, you're either Czech or Slovak, because I have pedantically skimmed the whole Reddit thread and I curiously have found screenshot in different language.

Do you know any free proxy server cloud service with limited data? Thank you.

dgtlmoon commented 3 years ago

I changed the post to a more specific title which reflects what the feature is that are you asking for, not if it could be some support request and if you are just asking if it could be done :)

dgtlmoon commented 3 years ago

Also, time scheduling (similar to calendly.com) per watch url would be beneficial. Since, at night no changes are being made.

This is already covered in https://github.com/dgtlmoon/changedetection.io/issues/164 if you skim the open issues

dgtlmoon commented 3 years ago

Also, better than a delay/wait is something smarter, like using JS and waiting for all DOM events to be loaded or something

IImtt commented 2 years ago

Also, better than a delay/wait is something smarter, like using JS and waiting for all DOM events to be loaded or something

That sounds simple, clever and effective. +1

cedilla1312 commented 2 years ago

Also, better than a delay/wait is something smarter, like using JS and waiting for all DOM events to be loaded or something

Yes, also source code has one line comment about it and @dgtlmoon mentioned it in another post, as well.

For example, when I use this to changedetect, sometimes the DOM is fully loaded, then minutes after, the app detects no .product-box__availability element on such site (diff is blank) and then again this app detects what was detected at first. Thus, I back-and-forth receive notifications about changes which are not relevant. Is there any way, how to prevent this? When I open the site manually, I have no problem, site loads in less than 5 seconds (though, this is the way how it works right now in changedetection.io). I think I have limit for 4 concurrent WebDriver Chrome sessions and I have like many website detections, could this be the cause? This "bug" happens to all sites I detect changes, as well.

To reproduce: CD.io version: v0.39 Site: https://www.nay.sk/graficke-karty/velkost-pamate_8 CSS/JSON filter: .product-box__availability

Anybody knows how to solve this, so I don't get false positives? Should I increase time to delay manually in source code? It might not help.

dgtlmoon commented 2 years ago

image

maybe something like this? the JS options seem pretty tricky and unreliable at times, maybe adding like "domloaded" JS event that triggers it or... but then at the end.. just a delay would help most situations and be a simple solution

dgtlmoon commented 2 years ago

if you get a blank entry, you can also use a text filter.. so a change isnt detected until the regex filter finds a number

dgtlmoon commented 2 years ago

https://github.com/dgtlmoon/changedetection.io/blob/536948c8c689e0fdac748084f84efdb60a5acf35/changedetectionio/content_fetcher.py#L123

there's a new env var available

dgtlmoon commented 2 years ago

Also handy https://github.com/dgtlmoon/changedetection.io/pull/608