Closed AliSoftware closed 3 months ago
Note for bookkeeping: that PR initially had issues with code-signing hostmgr
that made the CI fail on Validate Release.
Turns out the profiles expired, but match
kept giving me issues when I tried to renew them as usual:
match
complained about a corrupt cer file in S3In the end, the error was due to a recent ASC API change between Wednesday and Friday 😞 . I submitted a fix in fastlane core, after which I was finally (!) able to renew the profiles and make CI go green.
What / TL;DR
Fixes an issue with the latest version of
buildkite-agent
used in thexcode-15.3
VM image (and later ones) that prevents jobs from being transferred from the host to the VMWhy / Issue details
In the latest versions of
buildkite-agent
, the Job API experiment has been de-experimented and enabled by default.As a result,
buildkite-agent bootstrap
now tries to create a Unix socket at theBUILDKITE_SOCKETS_PATH
path, then exposes the created socket path and token asBUILDKITE_AGENT_JOB_API_SOCKET
andBUILDKITE_AGENT_JOB_API_TOKEN
env vars.The issue is that the default value for this path (aka
--sockets-path
option ofbuildkite-agent bootstrap
) is$HOME/.buildkite-agent/sockets
, so when ourhostmgr generate buildkite-job
command generates the script to handle the job in the VM, it exports allBUILDKITE_*
env vars in that script… including theBUILDKITE_SOCKETS_PATH
which was resolved to/Users/administrator/.buildkite-agent/sockets
on the host. This resulted inbuildkite-agent bootstrap
failing oncreating socket directory: mkdir /Users/administrator: permission denied
error.How
BUILDKITE_SOCKETS_PATH
env var in the generated script to/opt/ci/var/tmp/sockets
BUILDKITE_AGENT_JOB_API_SOCKET
andBUILDKITE_AGENT_JOB_API_TOKEN
in the generated scriptI also took the occasion of this PR to:
addEnvironmentVariable
calls forBUILDKITE_BUILD_CHECKOUT_PATH
,BUILDKITE_HOOKS_PATH
andBUILDKITE_PLUGINS_PATH
pointing to/usr/local/var/…
, as those are legacy paths; besides those env keys are part of eitherdisallowedKeys
oroverriddenKeys
, so were removed or overridden later in the code, beforescriptBuilder.build()
is called… so those particularaddEnvironmentVariable
calls were not impacting the generated script code after all .Paths.tempFilePath
constant which had the exact same value asPaths.tempDirectory
, and replace its only call sitevar
tolet
inPaths
constants and rearrange their order and grouping a bitcat
, for example 😉 )Testing
As it wasn't easy to test this without releasing and deploying a new
hostmgr
version to our Mac hosts, instead I:MV-MKE-ARM64-014
)Manually modify
/opt/ci/hooks/command
script like below, to unsetBUILDKITE_AGENT_JOB_API_SOCKET
andBUILDKITE_AGENT_JOB_API_TOKEN
and setBUILDKITE_SOCKETS_PATH
to hardcoded value, and thus simulate the same change made in thishostmgr
code:MV-MKE-ARM64-014
host/opt/ci/hooks/command
to restoreMV-MKE-ARM64-014
to its previous state.What's Next
Once this lands, I'll generate a new release of
hostmgr
(probably a non-beta0.50.0
) and work on deploying it (but probably not today, as it's a Friday and thus submission + code freeze day for many apps, so not the best day to interrupt CI (or risk breaking it during failed deployment 😅 ).