git-for-windows / setup-git-for-windows-sdk

A GitHub Action to initialize various flavors of the Git for Windows SDK
MIT License
20 stars 17 forks source link

Error `git: command not found` on GitHub-hosted arm64 runner that got GfW installed at runtime #951

Closed dennisameling closed 1 month ago

dennisameling commented 1 month ago

While working on https://github.com/dennisameling-org/git-for-windows-automation/pull/1, I ran into a weird issue where the setup-git-for-windows-sdk action didn't work, even though I manually installed it in an earlier step (the arm64 image doesn't have GfW installed yet).

I've created a consistent reproducer here, which shows some interesting behavior:

What I did so far to debug this issue

What I did to work around the issue

Here, I just explicitly added PATH="/c/Program Files/Git/bin:$PATH" to please.sh, so that it could find something to work with.

Noe that this issue will probably disappear as soon as GfW gets preinstalled on GitHub-hosted ARM64 runners, but I think we could make the setup-git-for-windows-sdk action more resilient to this type of issue moving forward.

I'm happy to implement a fix if you could point me in the right direction. Thanks!

rimrul commented 1 month ago

Bash itself, however, reports this, where it seems like C:/Program Files/Git/usr/bin got replaced with simply /usr/bin (I assume this is expected behavior).

It is expected.

This folder doesn't seem to contain the git.exe executable.

That is expected, too. /usr/bin/git.exe would be MSYS2 Git. We're looking for /cmd/git.exe or /mingw64/git.exe.

GitHub-hosted x64 runner (with GfW preinstalled): /cmd/git (so it somehow found Git in the /cmd folder, even though it doesn't show in the path??)

/cmd is on the PATH there.

[…]Program Files/nodejs:/cmd:/mingw64/bin:/usr/bin:/c/Program Files/GitHub CLI[…]

I have a strong feeling that this issue hasn't popped up before because both GitHub-hosted runners and our self-hosted runners have Git for Windows pre-installed, before (!!!) the GitHub runner agent starts and loads the environment variables from the OS.

I think this feeling is correct.

In the second attempt, it actually uses the Git executable to clone git-sdk-arm64 and build-extra. However, Creating build-installers artifact still fails with .tmp/build-extra/please.sh: line 106: git: command not found.

The cloning steps call the git.exe they want with an absolute path, but the please.sh step just calls git.

dscho commented 1 month ago

Maybe this code is the culprit for the behavior where /cmd/git.exe is found but Git can then not find its commands?

But that does not really match the error message, which looks more as if /cmd/git.exe does not find /clangarm64/bin/git.exe (and then let that do the actual work), even though it should find it.

rimrul commented 1 month ago

Maybe this code is the culprit for the behavior where /cmd/git.exe is found

I think it's this

but Git can then not find its commands?

But that does not really match the error message, which looks more as if /cmd/git.exe does not find /clangarm64/bin/git.exe

I think it's just bash doesn't have /cmd or /clangarm64/bin or /mingw64/bin on the path when calling please.sh

dscho commented 1 month ago

looks more as if /cmd/git.exe does not find /clangarm64/bin/git.exe

I think it's just bash doesn't have /cmd or /clangarm64/bin or /mingw64/bin on the path when calling please.sh

Good point. It happens to work on hosted runners because git.exe is already in the PATH there.

So we'd need to change this code around from someting like:

export const gitForWindowsUsrBinPath = `${gitRoot}/usr/bin`
[...]
              PATH: `${gitForWindowsUsrBinPath}${delimiter}${process.env.PATH}`

to something like:

export const gitForWindowsBinPaths =
  ['clangarm64', 'mingw64', 'mingw32', 'usr'].map(p => `${gitRoot}/${p}/bin`)
[...]
              PATH: `${gitForWindowsBinPaths.join(delimiter)}${delimiter}${process.env.PATH}`

I pushed something along those lines into my fork as fix-on-arm64 branch.

@dennisameling could you please try running this (switching out uses: dscho/setup-git-for-windows-sdk@fix-on-arm64 for uses: git-for-windows/setup-git-for-windows-sdk@v1 should do the trick)? This should even run without installing Git for Windows, as it should simply use the runner's own MinGit in that case.

dennisameling commented 1 month ago

@dennisameling could you please try running this (switching out uses: dscho/setup-git-for-windows-sdk@fix-on-arm64 for uses: git-for-windows/setup-git-for-windows-sdk@v1 should do the trick)?

That worked, thank you!!

This should even run without installing Git for Windows, as it should simply use the runner's own MinGit in that case.

It doesn't, because the cloning git-sdk-arm64 step still expects C:/Program Files/Git/cmd/git.exe to be present. But I'd say that's outside the scope of this issue.

I was working on a similar fix, and ended up extracting the spawn logic into a wrapper and adding tests. Curious to hear if you think that'd still add value to this repo.

dscho commented 1 month ago

It doesn't, because the cloning git-sdk-arm64 step still expects C:/Program Files/Git/cmd/git.exe to be present.

Hmm. But that clone() call implicitly spawns git.exe at gitExePath, which has previously been set to ${gitRoot}/cmd/git.exe, and that gitRoot should find the agent runner's MinGit. Does that not exist?

dennisameling commented 1 month ago

Does that not exist?

It doesn't, indeed. By default, the GitHub Actions Runner doesn't seem to have any flavor of Git pre-installed. It only seems to have NodeJS installed. Having Git pre-installed only seems to be the case on the default GitHub-hosted runners, like windows-2019 and windows-2022.

Even the AGENT_HOMEDIRECTORY environment variable doesn't exist at runtime (docs). Isn't that a leftover from the Azure Pipelines days?

externals-actions-runner
dscho commented 1 month ago

Does that not exist?

It doesn't, indeed. By default, the GitHub Actions Runner doesn't seem to have any flavor of Git pre-installed.

Huh. I thought it needed Git to run actions/checkout and the likes?

Even the AGENT_HOMEDIRECTORY environment variable doesn't exist at runtime (docs). Isn't that a leftover from the Azure Pipelines days?

Hmm. Possibly. Maybe RUNNER_TOOL_CACHE has it?

dennisameling commented 1 month ago

Huh. I thought it needed Git to run actions/checkout and the likes?

Apparently not, as can be seen here:

The repository will be downloaded using the GitHub REST API
To create a local Git repository instead, add Git 2.18 or higher to the PATH

It falls back to the GitHub REST API if Git isn't installed. However, in my earlier testing, that did cause issues later down the line:

fatal: not a git repository (or any of the parent directories): .git

Hmm. Possibly. Maybe RUNNER_TOOL_CACHE has it?

Negative. Looks like that folder doesn't even exist on a fresh installation of the Runner. So it looks like GfW really doesn't come bundled with the runner itself, only NodeJS.

GitHub Actions runner folder layout
dscho commented 1 month ago

So it looks like GfW really doesn't come bundled with the runner itself, only NodeJS.

Okay. Together with other insights I recently had (like: why do we have to clone Git for Windows' minimal SDK over and over again, when we are taking pains to look for the latest revision for which ci-artifacts passed successfully), I am coming to the conclusion that we need to change the architecture.

Instead of doing a sparse, partial & shallow clone of the Git for Windows SDK every single time the setup-git-for-windows-sdk Action runs (and does not find a cached version), why not upload the artifact right after the ci-artifacts run succeeded, to a public location?

I'll tell you why: I once looked into doing that, considering GitHub Packages. But they have rather strict retention rules and we definitely do not need to retain anything but the latest working minimal-sdk artifact. (And I don't want to waste resources unnecessarily, even if someone else pays for them.) So I looked into uploading that artifact to Azure Blobs, but had to pull the plug in one big hurry because it threatened to blow my Azure budget. So I gave up on that plan and instead went with the "partial, shallow, sparse clone & then cache" plan.

Turns out that other people had the same problem and became much more creative than myself. For example, the MSYS2 project needs to maintain a "srcinfo" cache where it stores information about all the latest package versions that were built and uploaded to their Pacman repository. And they use a GitHub release for that. Where they overwrite the release assets with whatever becomes the current version, over and over and over again.

We can do the same!

So here's my current plan: Modify the ci-artifacts GitHub workflow to upload the minimal-sdk artifact to a dedicated GitHub release in git-for-windows/git-sdk-64 (after the tests concluded successfully, of course). I want to force-update the tag associated with this GitHub release to point to the revision from which the artifact was built. I also want to provide the artifact both in the form of a .tar.gz as well as a self-extracting .7z file, the former for historical reasons, the latter to avoid problems on self-hosted runners where there is no pre-installed way to extract tar files.

The biggest downside of this strategy is that it won't work with anything else but the git-for-windows/git-sdk-64 repository because it requires that GitHub release, and requires effort to ensure that the release asset(s) are kept up to date. In particular, it won't work with the git-sdk-arm64 repository unless we start running that ci-artifacts GitHub workflow there, too. But we could start doing that soon, anyway.

A slightly smaller downside is that this locks the door for the idea to potentially use the minimal SDK for an arbitrary git-sdk-64 revision. But I never really added the code to enable that, anyway, so that's not such a big deal. In any case, it should be relatively easy to resurrect the code if the need for that feature should arise in the future.

dennisameling commented 1 month ago

And they use a GitHub release for that. Where they overwrite the release assets with whatever becomes the current version, over and over and over again.

We can do the same!

That makes a lot of sense to me!

I also want to provide the artifact both in the form of a .tar.gz as well as a self-extracting .7z file, the former for historical reasons, the latter to avoid problems on self-hosted runners where there is no pre-installed way to extract tar files.

Windows itself has had tar built-in for a while. It's definitely there on self-hosted runners as well. It's located at C:\Windows\System32\tar.exe. Could that be used to work with those .tar.gz files?

Let me know if I can help with anything!

dscho commented 1 month ago

Windows itself has had tar built-in for a while. It's definitely there on self-hosted runners as well. It's located at C:\Windows\System32\tar.exe. Could that be used to work with those .tar.gz files?

In my tests, this indeed works, and it works quite well, and blows /usr/bin/tar.exe out of the water. 4.3 seconds vs 6.5 seconds out of the water.

Here is step 1: https://github.com/git-for-windows/git-sdk-64/pull/87. Step 2 will be do adjust setup-for-windows-sdk to make use of that continuous release (in particular, measuring the speed to confirm that downloading and extracting the .zip file is the fastest).

Step 3 will be to set up the same changes in git-sdk-arm64. But I am a bit worried about that now, having realized yesterday that one of the self-hosted runners kept running since September 4th, blowing through ⅐ of my Azure credits... If that keeps happening, I'll end up with some serious trouble due to blocked Azure services (including GitGitGadget, for example).

dennisameling commented 1 month ago

Step 2 will be do adjust setup-for-windows-sdk to make use of that continuous release (in particular, measuring the speed to confirm that downloading and extracting the .zip file is the fastest).

Let me know if I can be of any help there.

Step 3 will be to set up the same changes in git-sdk-arm64. But I am a bit worried about that now, having realized yesterday that one of the self-hosted runners kept running since September 4th, blowing through ⅐ of my Azure credits... If that keeps happening, I'll end up with some serious trouble due to blocked Azure services (including GitGitGadget, for example).

I added a scheduled cleanup job here but back then, we decided that we just wanted to rely on webhooks for the time being. So we could probably still add that logic, or accelerate the switch over to GitHub-hosted arm64 runners instead (now in paid GA). I think we'll want to do the latter anyway at some point in the near future, before or after they become freely available for OSS projects (see the linked issue for more details).

dscho commented 1 month ago

I added a scheduled cleanup job here

You have a good point. I shouldn't have asked for that commit to be dropped. Here is my attempt to atone: https://github.com/git-for-windows/git-for-windows-automation/pull/93

dennisameling commented 1 month ago

Step 3 will be to set up the same changes in git-sdk-arm64

I have a PR in progress that's almost working: https://github.com/git-for-windows/git-sdk-arm64/pull/22

dennisameling commented 1 month ago

Step 2 will be do adjust setup-for-windows-sdk to make use of that continuous release (in particular, measuring the speed to confirm that downloading and extracting the .zip file is the fastest).

And to wrap things up, this PR implements step 2. Spoiler alert: it's incredibly fast!