fix(dataset): additionally check day-old flag files for update ids

chime-experiment / dias

A data integrity framework

https://dias.readthedocs.io/

GNU General Public License v3.0

2 stars 0 forks source link

fix(dataset): additionally check day-old flag files for update ids #197

Closed anjakefala closed 2 years ago

anjakefala commented 3 years ago

when flag updates are resent (which can happen during heat-related correlator shut-downs) that flag update was from before the correlator shutdown, but can end up in the chimestack file for the next UTC day.
this change additionally checks day-old flaginput files to validate flag update ids
this change also adds a metric updateid_not_found that is augmented when update ids are not found in the available flaginput files

Closes #196

anjakefala commented 3 years ago

This change was tested, and can confirm that it addresses the issue.

jrs65 commented 3 years ago

Is this just extending the time it looks back to 32 hours before the current files time?

jrs65 commented 3 years ago

I ask only because I'm trying to figure out if this fixes the problem for good, or if we were down for two days would we have this issue again on start up.

anjakefala commented 3 years ago

@jrs65 Hmm..you are right. The better thing to do would be to get the time ranges from the update ids themselves (instead of from the ctime for the chimestack file).

jrs65 commented 3 years ago

I don't mean to suggest we should necessarily do anything more, just that we should understand the limitations. In this case, I think this is probably fine, particularly if we modify the text of any alert to add "...don't worry if this occurs after a long period of downtime."

anjakefala commented 3 years ago

Good point about an alert! I added a metric to track if this occurs, so we can add an alert for it.