Closed da-kami closed 1 year ago
I suspect we are gonna drown in alerts, as we will be emitting this error log 8000 times per hour.
Perhaps there is a gentler way of handling this. Something like an alert that is triggered if 1 or more unique attestations cannot be fetched from the oracle.
But if you think this is the way I won't block this PR.
I think adding more elaborate logic will not help much it still results in the same problem. The current situation is only because the problem builds up over time because we try to fetch more and more missing attestations (as seen in the grafana query you linked). We have to detect the problem early, so we can react and don't run into the load.
I would get this into production and ignore alerts for specific even-ids where we know we cannot fetch them at the moment. This will be a bit cumbersome to set up, but I think this is the way to go. We could also handle this in rust or with e.g. an ignore file, but I don't think it's right to introduce code for this. In the future we will notice the problem early and can hopefully get it fixed before we see a lot of spam.
I suspect we are gonna drown in alerts, as we will be emitting this error log 8000 times per hour. Perhaps there is a gentler way of handling this. Something like an alert that is triggered if 1 or more unique attestations cannot be fetched from the oracle. But if you think this is the way I won't block this PR.
I think adding more elaborate logic will not help much it still results in the same problem. The current situation is only because the problem builds up over time because we try to fetch more and more missing attestations (as seen in the grafana query you linked). We have to detect the problem early, so we can react and don't run into the load.
I would get this into production and ignore alerts for specific even-ids where we know we cannot fetch them at the moment. This will be a bit cumbersome to set up, but I think this is the way to go. We could also handle this in rust or with e.g. an ignore file, but I don't think it's right to introduce code for this. In the future we will notice the problem early and can hopefully get it fixed before we see a lot of spam.
Isn't something like this sufficient?
This project is unmaintained now.
We only log it as error if the event timestamp is
10
minutes in the past to avoid false negative error reports when the oracle is slow to report the attestation.The
event_id
is added to the log statement for more context.I propose to merge this into
master
and then cherrypick this commit on a patch release on top of0.7.0
to get it into production. Is there anything else we would like to see in a patch release? I remember we wanted to release https://github.com/itchysats/itchysats/pull/3244 ?