SwiftPackageIndex / SwiftPackageIndex-Server

The Swift Package Index is the place to find Swift packages!
https://swiftpackageindex.com
Apache License 2.0
538 stars 42 forks source link

Investigate CHECK_MON_001 failures #3285

Closed finestructure closed 4 weeks ago

finestructure commented 1 month ago

We have 363 packages failing the CHECK_MON_001 alert, with the oldest updated_at being a week old.

All packages except one are sitting in the ingestion stage. They must be erroring out in analysis, preventing them from being updated.

finestructure commented 1 month ago

Packages can be analysed without errors locally on a db dump from today.

Manually analysing packages in prod also clears them from the CHECK_MON_001 list. Unclear what made them stick.

finestructure commented 1 month ago

Ok, mystery solved. There are two problems here. The alerting query is

            SELECT
              r.owner,
              r.name AS "repository",
              p.status,
              p.processing_stage,
              r.updated_at
            FROM
              repositories r
              JOIN packages p ON r.package_id = p.id
            WHERE
              r.updated_at < now() - INTERVAL \(literal: "\(timePeriod.hours) hours")
            ORDER BY
              updated_at

with timePeriod.hours set to 4. We're incorrectly looking at r.updated_at - the repositories.updated_at field - when we should be using p.updated_at. Not every analysis pass updates the repository (it only does when there are repo changes) but we explicitly update the package with the processing status.

Also, 4 hours is a bit too short now. It's takes ~ 4h and 10mins to around once so we'll want to bump that to 5h for the alert.

finestructure commented 1 month ago

Actually, checkMon001TimePeriod is already set to 6. I only had records showing up in my local testing because I incorrectly set it to 4 when running the equivalent query. The only issue is the updated_at field.

finestructure commented 1 month ago

I accidentally commit this change to my local main and we didn't have branch protection. I've enabled branch protection and will redo this change via a revert + change PR.

finestructure commented 1 month ago

Ok, branch protection doesn't seem to be working, not sure why.

finestructure commented 1 month ago

I've now also ticked the second box here, hoping that'll prevent pushes of main.

CleanShot 2024-08-13 at 10 50 48@2x