dgtlmoon / changedetection.io

The best and simplest free open source web page change detection, website watcher, restock monitor and notification service. Restock Monitor, change detection. Designed for simplicity - Simply monitor which websites had a text change for free. Free Open source web page change detection, Website defacement monitoring, Price change notification
https://changedetection.io
Apache License 2.0
17.43k stars 975 forks source link

EmptyReply / 404 for JSON not detected as a change. No notification was sent even though the `Treat empty page as a change` settings was enabled. #2528

Closed DarkLordGMS closed 2 months ago

DarkLordGMS commented 2 months ago

Here's an empty / 404 JSON URL that you can test it with. This one gives out the 404 error and doesn't trigger any notification even with the required settings enabled:

https://rdap.verisign.com/net/v1/domain/emptyjsonurl.net

Here's a working JSON URL that you can test it with, in case you need a working one:

https://rdap.verisign.com/net/v1/domain/google.net

I was watching a JSON that had data and then went blank (404). I was supposed to get a notification but I didn't get any notification even though all the settings shown in the picture below were enabled.

As you can see in the picture, the status literally shows the error "EmptyReply - try increasing 'Wait seconds before extracting text', Status Code 404" on the URL watch list but a notification was not sent.

Notifications are configured properly. They work on everything else. I'm using the latest version (0.46.02) on Debian 12.6. I also tested the previous release (0.46.01) but the same thing happened.

It looks like something similar happened with normal HTMLs on #2501 but the last reply was ignored and the issue was closed.

image

dgtlmoon commented 2 months ago

According to CURL, their connection is dropping

> GET /net/v1/domain/emptyjsonurl.net HTTP/1.1
> Host: rdap.verisign.com
> User-Agent: curl/7.81.0
> Accept: */*
> 
* TLSv1.2 (IN), TLS header, Supplemental data (23):
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
* TLSv1.2 (IN), TLS header, Supplemental data (23):
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
* old SSL session ID is stale, removing
* TLSv1.2 (IN), TLS header, Supplemental data (23):
* Mark bundle as not supporting multiuse
< HTTP/1.1 404 Not Found
< Content-Type: application/rdap+json
< Access-Control-Allow-Origin: *
< Strict-Transport-Security: max-age=15768000; includeSubDomains; preload
* no chunk, no close, no size. Assume close to signal end
< 
* TLSv1.2 (OUT), TLS header, Supplemental data (23):
* TLSv1.3 (OUT), TLS alert, decode error (562):
* OpenSSL SSL_read: error:0A000126:SSL routines::unexpected eof while reading, errno 0
* Closing connection 0
curl: (56) OpenSSL SSL_read: error:0A000126:SSL routines::unexpected eof while reading, errno 0

so hmm not sure what can be done here, its basically like someone hanging up the phone, right?

dgtlmoon commented 2 months ago

https://rdap.verisign.com/net/v1/domain/google.net should work fine tho?

DarkLordGMS commented 2 months ago

The first URL is working as intended. Let me try to explain. This is a RDAP server. They show the whois info for a domain. When the domain gets deleted, the whois information is dropped and it shows a 404 error. That means the domain is available. This is what I want to detect.

When the URL comes as blank and a 404 error comes up, Change Detection shows the error in the URL list but it doesn't send any notification. I enabled the settings shown on the screenshot which should ignore status error and/or detect empty pages but it still doesn't work.

The second URL is completely irrelevant but I included it so you could see what information is shown when a domain still exists.

DarkLordGMS commented 2 months ago

The first URL is working as intended. Let me try to explain. This is a RDAP server. They show the whois info for a domain. When the domain gets deleted, the whois information is dropped and it shows a 404 error. This is what I want to detect. It's so close because when the URL comes as blank and a 404 error comes up, Change Detection shows the error in the URL list as you can see above but it doesn't send any notifications. I enabled the settings shown on the screenshot which should ignore status error and/or detect empty pages but it still doesn't work.

The second URL is completely irrelevant but I included it so you could see what information is shown when a domain still exists.

tl;dr I just want to get a notification when the URL comes with a 404

DarkLordGMS commented 2 months ago

Sorry about the close/open. I'm typing through my phone now and my fat fingers tapped on the wrong place.

dgtlmoon commented 2 months ago

Hmm I guess you have this option enabled

image

but its still throwing that error right

DarkLordGMS commented 2 months ago

Yes, you can see it enabled in the screenshot above. Try it on your end. Add that first URL and enable those options. The error will show up and no notification.

dgtlmoon commented 2 months ago

Ok so it works if the HTML->Text conversion is empty, for example if the HTML contains this

<html>
    <head><title>modified head title</title></head>
    <!-- like when some angular app was broken and doesnt render or whatever -->
    <body>
     </body>
     </html>

but not yet if the response content has zero bytes/content

dgtlmoon commented 2 months ago

ok yeah so not quite a bug, more a misunderstanding/extra feature/not-a-bug-a-feature :)

dgtlmoon commented 2 months ago

try the :dev container in a few minutes (since you didnt state how you installed/running/OS etc)

DarkLordGMS commented 2 months ago

Good morning. I mentioned on my first comment that I'm using Debian 12.6. My installation method is Docker.

I pulled the :dev container as you said and now I'm getting a different error message.

Exception: No parsable JSON found in this document

To get this error message you have to use the same URL that I provided before:

https://rdap.verisign.com/net/v1/domain/emptyjsonurl.net

Also, on Filters & Triggers add this to CSS/JSONPath/JQ/XPath Filters:

json:$.events[2].eventDate

Still not getting notifications when going from an existing JSON to a blank JSON. You can simulate this by adding this URL and then after it gets checked, edit it to this URL

Here is a screenshot of the error:

image

dgtlmoon commented 2 months ago

ok hmm i think you've reached some side-case that cant be fixed, happy if you want to make a PR that includes a test tho - thank you

dgtlmoon commented 2 months ago

yeah looking into it, it really should have an extra option like "Skip filters on empty content" because there's a lot people out there who actually want to know when that filter fails, that would be the main default behaviour i think

DarkLordGMS commented 2 months ago

I'm sorry but I don't understand what you mean. What's a PR?

Also, instead of trying to workaround the error, would it be possible to an option like "Send notification on watched URL error"?

For example, if you are checking a URL and suddenly that URL has an error for any reason (e.g. filter error, 404, file not found, unauthorized, etc), that a notification gets sent when an error shows up. I think that should be possible.

Here's an idea that I just tried: I added http://127.0.0.1:5000 to the URL watch list and then I added EmptyReply as a "Trigger/wait for text". It worked! But then with this you're having a weird way of watching for errors. This shouldn't be necessary if there's a checkbox that you can enable so that a notification gets sent (once is fine) when any error with an URL has occurred.

dgtlmoon commented 2 months ago

a PR is https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/proposing-changes-to-your-work-with-pull-requests/about-pull-requests

yeah hmm unsure sorry :( you're really running on an edge-case here

DarkLordGMS commented 2 months ago

a PR is https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/proposing-changes-to-your-work-with-pull-requests/about-pull-requests

yeah hmm unsure sorry :( you're really running on an edge-case here

Oh I see. A PR is a pull request. I'm not a very good developer. I've only done hello worlds. :/

DarkLordGMS commented 2 months ago

yeah hmm unsure sorry :( you're really running on an edge-case here

Do you think that sending a notification when any error has occurred would not be useful?

DarkLordGMS commented 2 months ago

This other issue from last year is basically suggesting the same thing. Back then you agreed.

https://github.com/dgtlmoon/changedetection.io/issues/1678

dgtlmoon commented 2 months ago

You mean like this? https://github.com/dgtlmoon/changedetection.io/pull/1945 to send a notification on any error ?

DarkLordGMS commented 2 months ago

You mean like this? #1945 to send a notification on any error ?

Oh! That's a good one. I'll check if I can help in any way. For now, adding http://127.0.0.1:5000 to the URL watch list is working but a feature would be way cleaner.