Closed anjakefala closed 2 years ago
This change was tested, and can confirm that it addresses the issue.
Is this just extending the time it looks back to 32 hours before the current files time?
I ask only because I'm trying to figure out if this fixes the problem for good, or if we were down for two days would we have this issue again on start up.
@jrs65 Hmm..you are right. The better thing to do would be to get the time ranges from the update ids themselves (instead of from the ctime for the chimestack file).
I don't mean to suggest we should necessarily do anything more, just that we should understand the limitations. In this case, I think this is probably fine, particularly if we modify the text of any alert to add "...don't worry if this occurs after a long period of downtime."
Good point about an alert! I added a metric to track if this occurs, so we can add an alert for it.
when flag updates are resent (which can happen during heat-related correlator shut-downs) that flag update was from before the correlator shutdown, but can end up in the chimestack file for the next UTC day.
this change additionally checks day-old flaginput files to validate flag update ids
this change also adds a metric
updateid_not_found
that is augmented when update ids are not found in the available flaginput filesCloses #196