SwiftPackageIndex / SwiftPackageIndex-Server

The Swift Package Index is the place to find Swift packages!
https://swiftpackageindex.com
Apache License 2.0
554 stars 45 forks source link

infrastructureErrors across > 1,700 builds #3294

Closed finestructure closed 2 months ago

finestructure commented 2 months ago

We're currently tracking more than 1,700 infrastructure errors in our builds table (that's 0.4% of builds). I don't have older numbers but the vast majority of these errors is likely due to the recent builder change to detect unavailable destinations: #2780 .

Pretty much all Swift versions/platforms seem to be affected although Swift 6.0 dominates and there are other reasons for infrastructure errors than an unavailable destination. (We have no way of telling how many of those we have at the moment unless we looked through build logs.)

select swift_version->>'major' || '.' || (swift_version->>'minor') swift_version, platform, count(*) 
from builds
where status = 'infrastructureError'
group by swift_version, platform
order by count desc
Swfit version Platform Count
6.0 visionos 672
6.0 watchos 444
6.0 tvos 417
6.0 ios 97
5.10 watchos 21
5.9 watchos 19
5.8 watchos 17
5.9 tvos 16
5.8 tvos 16
5.10 tvos 16
6.0 macos-xcodebuild 15
5.10 visionos 14
5.9 visionos 12
5.10 macos-xcodebuild 5
5.8 macos-xcodebuild 4
5.9 macos-xcodebuild 4
5.10 ios 2
5.9 ios 2
5.8 ios 2
finestructure commented 2 months ago

Note that these "destination unavailable" errors aren't universal. Our integration test building SemanticVersion with visionOS for instance is passing and the latest Swift 6 builds in prod also succeed.

So when packages fail with this error it's not that the SDK isn't available at all. There must be something about how these packages are configured that makes destination selection fail for these packages specifically.

Best I can tell these infrastructure errors are a good thing, because previously these builds would have been run as the default macOS platform build instead, giving misleading results.

finestructure commented 2 months ago

To do:

finestructure commented 2 months ago

NB: this potentially also explains the slowdowns in processing we've been seeing in the last two runs (#9 and current one #10). This infrastructure error change went live on July 22, just before we started run #9 on July 24. We've since started adding a base load of packages that are getting retried which, towards the end of a run, is at its peak and is taking up a good chunk of the queue.

When we started run #10, we removed all Swift 6 builds (including the infra error ones) and thereby reduced that load by 1400 packages, only to then rebuild it over time as Swift 6 builds were being redone.

finestructure commented 2 months ago

As an interim measure we may want to mark these builds as failed instead of infrastructureError so they're not continuously being retried.

finestructure commented 2 months ago

I've prepared a fix here to change how we report "destination errors" here: https://gitlab.com/finestructure/swiftpackageindex-builder/-/merge_requests/308

These will now be reported as build failures so they're not re-tried.

finestructure commented 2 months ago

Recording two build logs with destination errors here so we can investigate the underlying infrastructure errors (as they will disappear from the build logs once we deploy this builder change).

Ideally, we'll want to be able to identify these kinds of failures more easily in the future but it's too disruptive to operations if we retry them all the time.

finestructure commented 2 months ago

All infrastructure errors have now cleared, i.e. they were all destination errors. Cleared in the sense that the builds have been reported as failed now and won't be re-tried.

finestructure commented 2 months ago

The problem likely lies with included xcodeproj files. For example, while SemanticVersion builds fine and lists the following schemes (without any xcodeproj file in the repo):

❯ xcodebuild -list -json
{
  "workspace" : {
    "name" : "SemanticVersion",
    "schemes" : [
      "SemanticVersion"
    ]
  }
}

we get the following for LeftPad:

❯ xcodebuild -list -json
{
  "project" : {
    "configurations" : [
      "Debug",
      "Release"
    ],
    "name" : "LeftPad",
    "schemes" : [
      "LeftPad-Package"
    ],
    "targets" : [
      "LeftPad",
      "LeftPadPackageDescription",
      "LeftPadPackageTests",
      "LeftPadTests"
    ]
  }
}

We then go on to build

env DEVELOPER_DIR=/Applications/Xcode-16.0.0-Beta.5.app xcrun xcodebuild -IDEClonedSourcePackagesDirPathOverride=$PWD/.dependencies -skipMacroValidation -skipPackagePluginValidation -derivedDataPath $PWD/.derivedData build -scheme LeftPad-Package -destination generic/platform=xrOS

which fails. It can be fixed by deleting the xcodeproj file and building the discovered scheme LeftPad:

❯ xcodebuild -list -json
{
  "workspace" : {
    "name" : "LeftPad",
    "schemes" : [
      "LeftPad"
    ]
  }
}

Build then passes:

❯ env DEVELOPER_DIR=/Applications/Xcode-16.0.0-Beta.5.app xcrun xcodebuild -IDEClonedSourcePackagesDirPathOverride=$PWD/.dependencies -skipMacroValidation -skipPackagePluginValidation -derivedDataPath $PWD/.derivedData build -scheme LeftPad -destination generic/platform=xrOS
Command line invocation:
    /Applications/Xcode-16.0.0-Beta.5.app/Contents/Developer/usr/bin/xcodebuild -IDEClonedSourcePackagesDirPathOverride=/Users/sas/Downloads/LeftPad/.dependencies -skipMacroValidation -skipPackagePluginValidation -derivedDataPath /Users/sas/Downloads/LeftPad/.derivedData build -scheme LeftPad -destination generic/platform=xrOS

User defaults from command line:
    IDEClonedSourcePackagesDirPathOverride = /Users/sas/Downloads/LeftPad/.dependencies
    IDEDerivedDataPathOverride = /Users/sas/Downloads/LeftPad/.derivedData
    IDEPackageSupportUseBuiltinSCM = YES

Resolve Package Graph

...

RegisterExecutionPolicyException /Users/sas/Downloads/LeftPad/.derivedData/Build/Products/Debug-xros/LeftPad.o (in target 'LeftPad' from project 'LeftPad')
    cd /Users/sas/Downloads/LeftPad
    builtin-RegisterExecutionPolicyException /Users/sas/Downloads/LeftPad/.derivedData/Build/Products/Debug-xros/LeftPad.o

** BUILD SUCCEEDED ** [0.863 sec]
finestructure commented 2 months ago

Closing this as we're doing the right thing to fail these builds.