SwiftPackageIndex / SwiftPackageIndex-Server

The Swift Package Index is the place to find Swift packages!
https://swiftpackageindex.com
Apache License 2.0
538 stars 42 forks source link

Analysis/Ingestion taking a very long time for some packages #3290

Closed daveverwer closed 3 weeks ago

daveverwer commented 1 month ago

I pushed a change to Leftpad's main branch at around midday yesterday and as of 6pm today, 30 hours later, the main branch change still has not been picked up by analysis and the README change still has not been picked up by ingestion.

Screenshot 2024-08-13 at 18 41 45@2x

On staging, both changes were picked up in reasonable time.

daveverwer commented 1 month ago

I thought it might be something to do with #3285 at first, but it now looks unrelated.

daveverwer commented 1 month ago

In packages on prod:

Screenshot 2024-08-13 at 18 47 34@2x

daveverwer commented 1 month ago

and repositories on prod:

Screenshot 2024-08-13 at 18 51 41@2x

and on staging:

Screenshot 2024-08-13 at 18 52 44@2x

finestructure commented 4 weeks ago

Generally, we analyse packages every ~6h at the moment:

CleanShot 2024-08-15 at 11 04 04@2x

For some reason, LeftPad isn't being picked up for analysis:

CleanShot 2024-08-15 at 11 19 41@2x

It does show up in the candidate selection query and I saw its position move up. It was around 515 at 10:30 CET this morning and now it's back down the bottom > position 5000. Its packages.update_at has moved to 2024-08-15 08:37:48.362663+00, so it has been visited by the process. processing_stage is ingestion which it would not be after analysis has processed it. It should be analysis instead.

We couldn't be left in a state where the package's updated_at field gets updated while processing_stage is not. This happens in the same SQL query and the whole analysis step is also run in a transaction.

It looks like LeftPad might be losing its position in the analysis queue before it gets its turn. What we're observing could happen if ingestion ran on the package again before it was picked up by analysis. Or generally, some process bumped its packages.updated_at field before it gets its turn.

finestructure commented 4 weeks ago

I don't think it can be ingestion, because ingestion doesn't pick up packages that are in ingestion stage. There's no indication that it moved out of ingestion. If analysis had picked it up, we'd a) see the new version, b) see its line in the logs, and c) see the stage at analysis.

Reconciliation doesn't look at existing packages at all, it's entirely driven by the package list as input.

I can't think of another process that's updating the packages table off the top of my heads but there must be something touching it.

finestructure commented 4 weeks ago

I think I have a lead. Checking what are the latest updated records across the whole schema for LeftPad, I noticed that there are build records at exactly the same time the package was updated 2024-08-15 08:37:48.332068+00:

CleanShot 2024-08-15 at 12 34 47@2x

Notice the infrastructure error and that these are visionOS builds. I suspect we have two issues:

This would explain what's going on: The retried visionOS builds update the package and put it back at the end of the analysis queue before it gets a chance to be analysed.

finestructure commented 4 weeks ago

This would also explain why this is only happening now. At least I'm pretty sure we'd have seen reports for outdated packages if this had been going on for longer. It's likely related to our latest Swift 6 run which exposes the package updating issue on build reports.

This build report updating issue (while not ideal) is not a problem in general, because even though it'll bump a package back to the end of the queue this will only happen once when the build is done. Here though, because the build is being retried continuously on a 4h schedule, it happens over and over before analysis can kick in.

finestructure commented 4 weeks ago

Tracking infrastructure errors in new issue: #3294

finestructure commented 3 weeks ago

We are indeed updating the package when we're receiving a build report:

                try await Package.updatePlatformCompatibility(for: version.$package.id, on: req.db)

Will have to see how/if we can avoid the re-queueing from happening.

daveverwer commented 3 weeks ago

Will have to see how/if we can avoid the re-queueing from happening.

It might be time for explicit last_analysed_at and last_ingested_at fields.