MuckRock / Klaxon

This repository contains a DocumentCloud Add-On that replicates the behavior of Klaxon, which allows you to monitor web pages for changes on sections of the site that might be newsworthy.
BSD 3-Clause "New" or "Revised" License
6 stars 1 forks source link

Erroneous "New Site Archived" notifications #8

Closed morisy closed 10 months ago

morisy commented 11 months ago

Screenshot 2024-01-08 at 10 23 08 AM

Not sure what else would be helpful in reporting, but I'm getting a fair number of erroneous errors about new sites archived, even for Klaxon runs that have been going for a while.

This one is set to hourly and most hours it appears to work correctly, but not all:

https://www.documentcloud.org/app?q=%2Buser%3Amichael-morisy-658%20#add-ons/MuckRock/Klaxon/831

duckduckgrayduck commented 11 months ago

Thanks @morisy I noticed this today too. I think it has something to do with responses I'm getting from archive.org using savepagenow- it is giving false negative hits and thus the email is created. I'll take a look at this later this week

duckduckgrayduck commented 10 months ago

Fixed in latest changes- if a site hasn't been entered into site-data before (which previously it was only added if something had changed), and there was an intermittent issue with archive.org's check for timestamps, then it would result in a false positive. By always adding the timestamp to site data even when the site is first seen, this issue is mitigated.