Automattic / hostmgr

A tool for managing macOS VM hosts
Mozilla Public License 2.0
8 stars 3 forks source link

Fix issue with buildkite-agent Job API when forwarding the job to the VM #85

Closed AliSoftware closed 3 months ago

AliSoftware commented 4 months ago

What / TL;DR

Fixes an issue with the latest version of buildkite-agent used in the xcode-15.3 VM image (and later ones) that prevents jobs from being transferred from the host to the VM

Why / Issue details

In the latest versions of buildkite-agent, the Job API experiment has been de-experimented and enabled by default.

As a result, buildkite-agent bootstrap now tries to create a Unix socket at the BUILDKITE_SOCKETS_PATH path, then exposes the created socket path and token as BUILDKITE_AGENT_JOB_API_SOCKET and BUILDKITE_AGENT_JOB_API_TOKEN env vars.

The issue is that the default value for this path (aka --sockets-path option of buildkite-agent bootstrap) is $HOME/.buildkite-agent/sockets, so when our hostmgr generate buildkite-job command generates the script to handle the job in the VM, it exports all BUILDKITE_* env vars in that script… including the BUILDKITE_SOCKETS_PATH which was resolved to /Users/administrator/.buildkite-agent/sockets on the host. This resulted in buildkite-agent bootstrap failing on creating socket directory: mkdir /Users/administrator: permission denied error.

How

I also took the occasion of this PR to:

Testing

As it wasn't easy to test this without releasing and deploying a new hostmgr version to our Mac hosts, instead I:

What's Next

Once this lands, I'll generate a new release of hostmgr (probably a non-beta 0.50.0) and work on deploying it (but probably not today, as it's a Friday and thus submission + code freeze day for many apps, so not the best day to interrupt CI (or risk breaking it during failed deployment 😅 ).

AliSoftware commented 4 months ago

Note for bookkeeping: that PR initially had issues with code-signing hostmgr that made the CI fail on Validate Release.

Turns out the profiles expired, but match kept giving me issues when I tried to renew them as usual:

In the end, the error was due to a recent ASC API change between Wednesday and Friday 😞 . I submitted a fix in fastlane core, after which I was finally (!) able to renew the profiles and make CI go green.