Mean Time to Restore prometheus rules can miss tickets in calculation

OpenShift version

Not related to OpenShift

Problem description

There's something wrong with how we are calculating the mean time to restore for individual issues. I'm not sure why, but sometimes we can skip an issue.

Here's a readout from my prometheus:

min by (issue_number, service) (min_over_time(failure_resolution_timestamp{app=~".*pelorus-api.*"}[2d] @ 1710475200)) - min by (issue_number, service) (min_over_time(failure_creation_timestamp{app=~".*pelorus-api.*"}[2d] @ 1710475200))

{issue_number="23", service="github-failure-exporter"}
525
{issue_number="24", service="github-failure-exporter"}
64822
{issue_number="25", service="github-failure-exporter"}
66518
{issue_number="29", service="github-failure-exporter"}
6315
{issue_number="22", service="github-failure-exporter"}
564

sdp:time_to_restore:by_issue{app=~".*pelorus-api.*"}[2d] @ 1710475200


sdp:time_to_restore:by_issue{app="/pelorus-api/", container="github-failure-exporter", endpoint="http", instance="10.129.0.25:8080", issue_number="29", job="github-failure-exporter", namespace="pelorus", pod="github-failure-exporter-1-lnssw", service="github-failure-exporter"}
6315 @1710428928.453
6315 @1710428958.453
6315 @1710428988.453
6315 @1710429018.453
6315 @1710429048.453
6315 @1710429078.453
6315 @1710429108.453
6315 @1710429138.453
6315 @1710429168.453
6315 @1710429198.453

sdp:time_to_restore:by_issue{app="/pelorus-api/", container="github-failure-exporter", endpoint="http", instance="10.129.0.32:8080", issue_number="22", job="github-failure-exporter", namespace="pelorus", pod="github-failure-exporter-1-lnssw", service="github-failure-exporter"}
564 @1710340278.453
564 @1710340308.453
564 @1710340338.453
564 @1710340368.453
564 @1710340398.453
564 @1710340428.453
564 @1710340458.453
564 @1710340488.453
564 @1710340518.453

sdp:time_to_restore:by_issue{app="/pelorus-api/", container="github-failure-exporter", endpoint="http", instance="10.129.0.32:8080", issue_number="23", job="github-failure-exporter", namespace="pelorus", pod="github-failure-exporter-1-lnssw", service="github-failure-exporter"}
525 @1710355968.453
525 @1710355998.453
525 @1710356028.453
525 @1710356058.453
525 @1710356088.453
525 @1710356118.453
525 @1710356148.453

These two queries should yield the same number of results, but they do not.

Steps to reproduce

Install pelorus with github-failure-exporter
Open and close a bunch of github issues

Current behavior

See above

Expected behavior

See Above

Code of Conduct

[X] I agree to follow Pelorus's Code of Conduct

dora-metrics / pelorus

Mean Time to Restore prometheus rules can miss tickets in calculation #1127