dgtlmoon / changedetection.io

The best and simplest free open source web page change detection, website watcher, restock monitor and notification service. Restock Monitor, change detection. Designed for simplicity - Simply monitor which websites had a text change for free. Free Open source web page change detection, Website defacement monitoring, Price change notification
https://changedetection.io
Apache License 2.0
16.84k stars 940 forks source link

Change triggered/detected as Blank Diff on some sites when filter availability fluctuates #962

Closed yenba closed 1 year ago

yenba commented 1 year ago

Describe the bug Occasionally I will have a notification kick off saying that there was a "change" but the diff is blank and the files are identical.

Version v0.39.19.1 - Running in Docker Container on Ubuntu 22.04.1 LTS Server

To Reproduce I'm not sure how to reproduce the behavior as it seems inconsistent.

Share link https://changedetection.io/share/ym-I7IBLMW4a

Expected behavior The program to not trigger alerts if there are no changes in the diff.

Screenshots No changes are detected in this diff comparison however it still triggered a change and a notification.

image

Here are the actual files compared in VScode. Same thing, no difference between them.

Screen Shot 2022-09-20 at 2 43 54 PM

Desktop

freddieleeman commented 1 year ago

I am experiencing the same issue and have been unable to reproduce it.

dgtlmoon commented 1 year ago

I've seen it also, I would love to add a ENV flag to save each HTML that was downloaded to try isolate it, I have a feeling it might be something in the encoding changing or something.

there is this attempt https://github.com/dgtlmoon/changedetection.io/pull/925

but again, I cant be expected todo all the work here, would be awesomesauce if someone else would help a bit :(

bykidi commented 1 year ago

getting the same. i monitor lot of github tags pages and suddenly, sometimes one, sometimes multiple pages trigger blank changes. sometimes it triggers blank changes on other sites too. i have no proper way to reproduce it. the only way is to add a lot of my monitors to yours and wait. изображение

"div.Box-row:nth-child(1) > div:nth-child(1) > div:nth-child(1) > div:nth-child(1) > h4:nth-child(1) > a:nth-child(1)"

dgtlmoon commented 1 year ago

@bykidi can you hit the 'share url' button and paste in the link it generates?

dgtlmoon commented 1 year ago

@bykidi btw... that dark mode looks awesome, how did you do it?

bykidi commented 1 year ago

What i've noticed: using CSS selector (manual pick from firefox instead what is offered with built-in visual selector) reduces/eleminates false detections on github pages. but on some pages changedetection can't find my CSS selectors, which is why i'm forced to use the visual selector result. examples: wireguard download page: https://download.wireguard.com/windows-client/ "body > ul:nth-child(7) > li:nth-child(1) > a:nth-child(1)" results in: изображение k-lite download pages, full and update packages: https://codecguide.com/download_k-lite_codec_pack_mega.htm https://codecguide.com/klcp_update.htm ".tdcontent > h4:nth-child(6)" ".tdcontent > h4:nth-child(5)" results are: изображение

about those that sometimes trigger false changes... here is a bunch of watches, hope some of it can also trigger on your side https://changedetection.io/share/GBhNAeP6frca https://changedetection.io/share/dK34ZnckcSka https://changedetection.io/share/er2XApq63hQa https://changedetection.io/share/CBYplonj1Jga https://changedetection.io/share/4M-wjTz9Zswa https://changedetection.io/share/epV2uEgU4QUa https://changedetection.io/share/Uv534DVIV64a https://changedetection.io/share/v-L2Ft3LyZsa https://changedetection.io/share/swmRUdLLGJwa https://changedetection.io/share/RFM-bO2lb0ca https://changedetection.io/share/JYAoQc_nYZIa https://changedetection.io/share/OAbC9G2Y4gka https://changedetection.io/share/XPtjn5ICqvMa https://changedetection.io/share/_MxYVtnU60Ya this one triggered today

i use dark reader globally on all sites with exceptions изображение

bykidi commented 1 year ago

linking this issue with mine #908

bykidi commented 1 year ago

Morning false positives on a bunch of github tags pages: изображение изображение изображение https://changedetection.io/share/W52piCoIwz0a https://changedetection.io/share/95fCdVFLbO8a https://changedetection.io/share/95fCdVFLbO8a https://changedetection.io/share/NPTWII0MvH4a https://changedetection.io/share/NqUybbruJCMa

dgtlmoon commented 1 year ago

@bykidi does it only happen with watches that use chrome? or does it happen for all types of requests?

bykidi commented 1 year ago

@dgtlmoon actually, i have no idea... all of my fetches use the latest version of chrome.

yenba commented 1 year ago

@bykidi does it only happen with watches that use chrome? or does it happen for all types of requests?

For me, it happens in all types of requests, not just Chrome.

dgtlmoon commented 1 year ago

I think I know - it's caused when you have a CSS/xPath filter applied, but the filter can not be found, then it is found again on the next check

I'm betting that your watches all have filters

dgtlmoon commented 1 year ago

Ok so I don't know how this is fixable yet, because some people have the scenario that

There is a test to make sure this works https://github.com/dgtlmoon/changedetection.io/blob/3ebb2ab9ba593bea346c8ca20364f8690568170b/changedetectionio/tests/test_filter_exist_changes.py#L45

But here on this issue its like

bykidi commented 1 year ago

i got filters on everything because sometimes there is a lot of changes that won't fit into telegram's message symbol limit, which is why i check for versions (mostly software) and then add the full diff link to it.

yenba commented 1 year ago

Ok so I don't know how this is fixable yet, because some people have the scenario that

  • They add a CSS filter for an element that doesn't yet exist, but SHOULD in the future (like a cinema ticket goes on sale .on-sale)

  • They want changedetection.io to keep checking and notify them when a change/filter was detected

There is a test to make sure this works https://github.com/dgtlmoon/changedetection.io/blob/3ebb2ab9ba593bea346c8ca20364f8690568170b/changedetectionio/tests/test_filter_exist_changes.py#L45

But here on this issue its like

  • Filter existed for a while

  • Something in the JS or Browser didnt work, so the page partly rendered but the filter was missing

  • Page rechecked, filter re-appeared

  • Notification was sent

Hmm. That does make sense. Thanks for taking a look at it!

Maybe instead of "fixing" it, there could be some kind of a workaround. Something like an option to not send notifications if the {diff} field is blank?

At least in my case that would solve the blank notifications!

bykidi commented 1 year ago

Can confirm. This early morning techpowerup site was down and my instance triggered 'filters not found 6 times' notify. Later then it triggered blank changes on all of the previous 'not found' watches. I think that we need 'only monitor for actual changes (ignore not found/found again)' option by default and those who monitor 'out of stock/back in stock' should specially use that option.

bykidi commented 1 year ago

изображение изображение

adamrgolf commented 1 year ago

I've been seeing this as well for the past few weeks. It will usually trigger multiple sites and push notifications even though there is no change/diff

dgtlmoon commented 1 year ago

I was thinking of a smarter way to deal with this, maybe like a ratio number stored where the 0.0-1.0 tells of the success of the last 10(?) attempts

if the success ratio < 0.5 then we can send some alert/notification such as "Looks like the filter is sometimes not available and maybe sending false alerts, would you like to limit this watch to (insert solution here)"

dgtlmoon commented 1 year ago

@bykidi

Can confirm. This early morning techpowerup site was down and my instance triggered 'filters not found 6 times' notify. Later then it triggered blank changes on all of the previous 'not found' watches.

Thanks for that - that was exactly what I was thinking was happening

bykidi commented 1 year ago

it is happening right now изображение i have paused those watches. weird stuff happens - i get red notification that my specific filter is not found, but i see that the page was rendered properly (proper page screenshot on the preview tab)

bykidi commented 1 year ago

There is indeed a change. Previously, i used this filter div.Box-row:nth-child(1) > div:nth-child(1) > div:nth-child(1) > div:nth-child(1) > h4:nth-child(1) > a:nth-child(1) but when there is 'no filter' - this exact field should be filtered like this div.Box-row:nth-child(1) > div:nth-child(1) > div:nth-child(1) > div:nth-child(1) > div:nth-child(1) > h2:nth-child(1) > a:nth-child(1)

bykidi commented 1 year ago

After testing with the browserless version specified in the installation guide - the problem is the same. On github, sometimes (like right now) i still get false positives. Mostly, on github. изображение

NaruZosa commented 1 year ago

Imo an 'ignore blank diff' option would work fine

bykidi commented 1 year ago

what we probably need is a 'stock/out of stock' mode which should ignore that and only monitor for actual changes...

dgtlmoon commented 1 year ago

@bykidi so whats the solution in the case that the CSS filter doesnt exist any more because you set the CSS filter to point to something like .current-price ?, then .current-price disappears because it's sold out.. then comes back again.. you never knew

there is no easy answer

@NaruZosa but that would ignore when the filter was missing

bykidi commented 1 year ago

i mean, for those kind of monitors there may be additional checkbox. if its enabled - changedetection acts like it does right now. but if its isn't - it should ignore for missing then found again filters and only notify if its missing for more than %filter_failure_notification_threshold_attempts% times

dgtlmoon commented 1 year ago

@bykidi yeha almost.. I think we're coming up with solutions based on assumptions tho, the solution should be

dgtlmoon commented 1 year ago

Imagine we were looking at this issue from the very start, and we have that log available, where we could have seen what was going on

dgtlmoon commented 1 year ago

btw https://github.com/dgtlmoon/changedetection.io/blob/f86763dc7a27ca71bf432da3ec31a827f35b1648/changedetectionio/tests/test_filter_exist_changes.py#L44

wrobelda commented 1 year ago

I can reproduce this issue each time a check is met with a 429, followed by an OK-result (400 200). The latter will trigger a false-positive notification with an empty diff. I presume maybe the 429 pushes a record to the database, which then consecutive 400 200 result gets compared with?

Interestingly, I did not have such issues before, and I believe this may be a result of some specific configuration I changed. In General settings, I have "Treat empty pages as a change?" disabled, "Extract from document and use as watch title" enabled, "Number of times the filter can be missing before sending a notification" set to 0. In Global Filters, "Ignore whitespace" enabled. The checks individually have "Send a notification when the filter can no longer be found on the page" disabled. </p> </div> </div> <div class="comment"> <div class="user"> <a rel="noreferrer nofollow" target="_blank" href="https://github.com/dgtlmoon"><img src="https://avatars.githubusercontent.com/u/275001?v=4" />dgtlmoon</a> commented <strong> 1 year ago</strong> </div> <div class="markdown-body"> <p>@wrobelda > I can reproduce this issue each time a check is met with a 429, followed by an OK-result (400).</p> <p>yeah that's super interesting, i'll write a automated test case and see if I can trigger that, maybe thats the main problem - non-200 replies are resetting the checksum</p> </div> </div> <div class="comment"> <div class="user"> <a rel="noreferrer nofollow" target="_blank" href="https://github.com/dgtlmoon"><img src="https://avatars.githubusercontent.com/u/275001?v=4" />dgtlmoon</a> commented <strong> 1 year ago</strong> </div> <div class="markdown-body"> <p>@bykidi @wrobelda > I can reproduce this issue each time a check is met with a 429, followed by an OK-result (400).</p> <p>but <code>400</code> means 'bad response', this shouldnt trigger it, OR it will trigger it when 'allow non 200 responses' option is checked</p> </div> </div> <div class="comment"> <div class="user"> <a rel="noreferrer nofollow" target="_blank" href="https://github.com/wrobelda"><img src="https://avatars.githubusercontent.com/u/1702435?v=4" />wrobelda</a> commented <strong> 1 year ago</strong> </div> <div class="markdown-body"> <blockquote> <p>@bykidi @wrobelda > I can reproduce this issue each time a check is met with a 429, followed by an OK-result (400).</p> <p>but <code>400</code> means 'bad response', this shouldnt trigger it, OR it will trigger it when 'allow non 200 responses' option is checked</p> </blockquote> <p>My apologies, I meant 200, of course, not 400!🤦🏻‍♂️ I am getting 429 when I occasionally get throttled. Once the limitation is lifted, things get back to normal, except that false-positive notification.</p> </div> </div> <div class="comment"> <div class="user"> <a rel="noreferrer nofollow" target="_blank" href="https://github.com/dgtlmoon"><img src="https://avatars.githubusercontent.com/u/275001?v=4" />dgtlmoon</a> commented <strong> 1 year ago</strong> </div> <div class="markdown-body"> <p>@wrobelda @bykidi <a href="https://github.com/dgtlmoon/changedetection.io/pull/1385">https://github.com/dgtlmoon/changedetection.io/pull/1385</a> test passes... so its setting a filter, grabs the content with that filter, then tries different return-codes to see if it triggers a change... everything works</p> <p>maybe related to playwright? are you able to try reproduce this with plain requests? can you try?</p> </div> </div> <div class="comment"> <div class="user"> <a rel="noreferrer nofollow" target="_blank" href="https://github.com/yenba"><img src="https://avatars.githubusercontent.com/u/17832515?v=4" />yenba</a> commented <strong> 1 year ago</strong> </div> <div class="markdown-body"> <blockquote> <p>@wrobelda @bykidi #1385 test passes... so its setting a filter, grabs the content with that filter, then tries different return-codes to see if it triggers a change... everything works</p> <p>maybe related to playwright? are you able to try reproduce this with plain requests? can you try?</p> </blockquote> <p>I have had the "blank diff" issue on both playwright requests and plain requests.</p> <h3>Plain Request</h3> <p><a href="https://changedetection.io/share/lC1Re_gJiWUa">https://changedetection.io/share/lC1Re_gJiWUa</a></p> <img width="803" alt="image" src="https://user-images.githubusercontent.com/17832515/216837599-c9687993-be0c-4ef8-8028-6529a9740935.png"> <h3>Playwright Request</h3> <p><a href="https://changedetection.io/share/QeimEMmIqa0a">https://changedetection.io/share/QeimEMmIqa0a</a></p> <img width="806" alt="image" src="https://user-images.githubusercontent.com/17832515/216837672-f24bff8d-40ee-4e9f-9853-e24bb07c0960.png"> </div> </div> <div class="comment"> <div class="user"> <a rel="noreferrer nofollow" target="_blank" href="https://github.com/dgtlmoon"><img src="https://avatars.githubusercontent.com/u/275001?v=4" />dgtlmoon</a> commented <strong> 1 year ago</strong> </div> <div class="markdown-body"> <p>I cant find a fault :-( but there must be something</p> </div> </div> <div class="comment"> <div class="user"> <a rel="noreferrer nofollow" target="_blank" href="https://github.com/wrobelda"><img src="https://avatars.githubusercontent.com/u/1702435?v=4" />wrobelda</a> commented <strong> 1 year ago</strong> </div> <div class="markdown-body"> <blockquote> <p>I cant find a fault :-( but there must be something</p> </blockquote> <p>Is there some verbose logging that could be enabled to trace the steps leading to notification? </p> </div> </div> <div class="comment"> <div class="user"> <a rel="noreferrer nofollow" target="_blank" href="https://github.com/dgtlmoon"><img src="https://avatars.githubusercontent.com/u/275001?v=4" />dgtlmoon</a> commented <strong> 1 year ago</strong> </div> <div class="markdown-body"> <p>You can track it with the step debugger, but more important than any of that is if you can find a way to reproduce it reliably</p> </div> </div> <div class="comment"> <div class="user"> <a rel="noreferrer nofollow" target="_blank" href="https://github.com/dgtlmoon"><img src="https://avatars.githubusercontent.com/u/275001?v=4" />dgtlmoon</a> commented <strong> 1 year ago</strong> </div> <div class="markdown-body"> <p><a href="https://github.com/dgtlmoon/changedetection.io/issues/845">https://github.com/dgtlmoon/changedetection.io/issues/845</a></p> </div> </div> <div class="comment"> <div class="user"> <a rel="noreferrer nofollow" target="_blank" href="https://github.com/bykidi"><img src="https://avatars.githubusercontent.com/u/1337953?v=4" />bykidi</a> commented <strong> 1 year ago</strong> </div> <div class="markdown-body"> <p>dunno if that is related or not, but i had a lot of empty changes during extensive packet losses because of my ISP. can test it when i get back home by faking packet losses on my mikrotik (edit: was busy yesterday, hope to get to my pc asap)</p> </div> </div> <div class="comment"> <div class="user"> <a rel="noreferrer nofollow" target="_blank" href="https://github.com/dgtlmoon"><img src="https://avatars.githubusercontent.com/u/275001?v=4" />dgtlmoon</a> commented <strong> 1 year ago</strong> </div> <div class="markdown-body"> <p>@bykidi yeah let me know.. could be a relationship if theres filters enabled, and the page returns empty.. or..</p> </div> </div> <div class="comment"> <div class="user"> <a rel="noreferrer nofollow" target="_blank" href="https://github.com/dgtlmoon"><img src="https://avatars.githubusercontent.com/u/275001?v=4" />dgtlmoon</a> commented <strong> 1 year ago</strong> </div> <div class="markdown-body"> <p>Basically we'll keep trying to find the scenario, add a test that can prove it, then fix the code</p> </div> </div> <div class="comment"> <div class="user"> <a rel="noreferrer nofollow" target="_blank" href="https://github.com/dgtlmoon"><img src="https://avatars.githubusercontent.com/u/275001?v=4" />dgtlmoon</a> commented <strong> 1 year ago</strong> </div> <div class="markdown-body"> <p>@bykidi @wrobelda I'de really love if you can find a way to reproduce this using the plain-requests method</p> </div> </div> <div class="comment"> <div class="user"> <a rel="noreferrer nofollow" target="_blank" href="https://github.com/wrobelda"><img src="https://avatars.githubusercontent.com/u/1702435?v=4" />wrobelda</a> commented <strong> 1 year ago</strong> </div> <div class="markdown-body"> <blockquote> <p>@bykidi @wrobelda I'de really love if you can find a way to reproduce this using the plain-requests method</p> </blockquote> <p>Actually, all of this was plain-request! I also get false-positives with Playwright for different checks, but I didn't notice any particular pattern there like I did with 429 -> 200. </p> </div> </div> <div class="comment"> <div class="user"> <a rel="noreferrer nofollow" target="_blank" href="https://github.com/bykidi"><img src="https://avatars.githubusercontent.com/u/1337953?v=4" />bykidi</a> commented <strong> 1 year ago</strong> </div> <div class="markdown-body"> <p>i've put a 75% drop rate firewall rule with for all packages and one page just gave me a false-positive <img src="https://user-images.githubusercontent.com/1337953/217307773-fa79bc2e-a07d-43e1-bf04-e2ab878c5d2e.png" alt="изображение" /></p> </div> </div> <div class="comment"> <div class="user"> <a rel="noreferrer nofollow" target="_blank" href="https://github.com/dgtlmoon"><img src="https://avatars.githubusercontent.com/u/275001?v=4" />dgtlmoon</a> commented <strong> 1 year ago</strong> </div> <div class="markdown-body"> <p>@bykidi requests or playwright? which one?</p> </div> </div> <div class="comment"> <div class="user"> <a rel="noreferrer nofollow" target="_blank" href="https://github.com/bykidi"><img src="https://avatars.githubusercontent.com/u/1337953?v=4" />bykidi</a> commented <strong> 1 year ago</strong> </div> <div class="markdown-body"> <p>whoopsie... here goes my telegram token...</p> </div> </div> <div class="comment"> <div class="user"> <a rel="noreferrer nofollow" target="_blank" href="https://github.com/dgtlmoon"><img src="https://avatars.githubusercontent.com/u/275001?v=4" />dgtlmoon</a> commented <strong> 1 year ago</strong> </div> <div class="markdown-body"> <p>deleted it for you</p> </div> </div> <div class="comment"> <div class="user"> <a rel="noreferrer nofollow" target="_blank" href="https://github.com/bykidi"><img src="https://avatars.githubusercontent.com/u/1337953?v=4" />bykidi</a> commented <strong> 1 year ago</strong> </div> <div class="markdown-body"> <p>well, i guess, in my case, its 'not found, found again', after all...</p> <blockquote> <blockquote> <p>Filter for ff6e155e-0166-47fc-bc8e-b7f734f2b796 not found, consecutive_filter_failures: 1 Change detected in UUID ff6e155e-0166-47fc-bc8e-b7f734f2b796 - <a href="https://github.com/NickeManarin/ScreenToGif/tags">https://github.com/NickeManarin/ScreenToGif/tags</a> Process Notification: AppRise notifying</p> </blockquote> </blockquote> </div> </div> <div class="page-bar-simple"> <a href="/dgtlmoon/changedetection.io/962?page=2" class="next">Next</a> </div> <div class="footer"> <ul class="body"> <li>© <script> document.write(new Date().getFullYear()) </script> Githubissues.</li> <li>Githubissues is a development platform for aggregating issues.</li> </ul> </div> <script src="https://cdn.jsdelivr.net/npm/jquery@3.5.1/dist/jquery.min.js"></script> <script src="/githubissues/assets/js.js"></script> <script src="/githubissues/assets/markdown.js"></script> <script src="https://cdn.jsdelivr.net/gh/highlightjs/cdn-release@11.4.0/build/highlight.min.js"></script> <script src="https://cdn.jsdelivr.net/gh/highlightjs/cdn-release@11.4.0/build/languages/go.min.js"></script> <script> hljs.highlightAll(); </script> </body> </html>