bcgov / DITP-DevOps

Digital Identity and Trust Program Team's DevOps Documentation Repository
Apache License 2.0
2 stars 5 forks source link

SLA uptime always red with IndyScan test outages #201

Open loneil opened 2 weeks ago

loneil commented 2 weeks ago

We have 2 uptime monitoring dashboards https://ditp.uptime.vonx.io/?start=20240729&end=20240827 https://ditp.sla.vonx.io/?start=20240729&end=20240827

The second SLA one is red every day due to "BCovrin IndyScan Sync - Test" never being above the threshold. https://ditp.sla.vonx.io/statuspage/ditp.sla/2714146?start=20240827&end=20240827

image

Should probably remove it or adjust that threshold if it's not important for an SLA to track (or if it is important enough and has to not be down daily would need a fix)

WadeBarnes commented 2 weeks ago

I've adjusted the SLAs on BCovrin - Test and BCovrin IndyScan Sync - Test

WadeBarnes commented 2 weeks ago

I've decided to remove BCovrin IndyScan Sync - Test from the SLA page completely. The BCovrin IndyScan Sync - Test monitor is really intended as an alert to indicate when BCovrin IndyScan Sync - Test is out of sync with the ledger for an extended period of time. BCovrin IndyScan Sync - Test will always get out of sync with the ledger for a period of time when there's a high transaction volume, and this happens every evening for a few hours when the ACA-Py integration test kick off.

Removing BCovrin IndyScan Sync - Test from the SLA page cleans things up significantly providing a clearer picture.

image

WadeBarnes commented 2 weeks ago

@loneil, Have a look and see what you think. We can further refine the SLAs of each of the monitors or the services listed on the page as needed.