git-for-windows / git

A fork of Git containing Windows-specific patches.
http://gitforwindows.org/
Other
8.39k stars 2.55k forks source link

`git update-git-for-windows --yes` does not work in GitHub workflows #5230

Open dscho opened 4 weeks ago

dscho commented 4 weeks ago

It was reported by @rotu in https://github.com/git-for-windows/git/issues/5199#issuecomment-2435933384 that it does not work:

- name: Update git for windows
    if: runner.os == 'Windows'
    run: git update-git-for-windows --yes
    shell: cmd
2024-10-24T17:31:41.3652735Z �[36;1mgit update-git-for-windows --yes�[0m
2024-10-24T17:31:41.3677331Z shell: C:\Windows\system32\cmd.EXE /D /E:ON /V:OFF /S /C "CALL "{0}""
2024-10-24T17:31:41.3678050Z ##[endgroup]
2024-10-24T17:31:43.6864522Z Git for Windows 2.47.0.windows.1 (64-bit)
2024-10-24T17:31:44.0033217Z Update 2.47.0.windows.2 is available
2024-10-24T17:31:44.5000031Z Downloading Git-2.47.0.2-64-bit.exe
...
2024-10-24T17:31:44.9470534Z ######################################################################## 100.0%
2024-10-24T17:31:51.5451678Z ##[error]Process completed with exit code 2.

It could be a modal dialog? If so, it should be possible to capture it with something like this PowerScript:

[Reflection.Assembly]::LoadWithPartialName("System.Drawing")
function screenshot([Drawing.Rectangle]$bounds, $path) {
   $bmp = New-Object Drawing.Bitmap $bounds.width, $bounds.height
   $graphics = [Drawing.Graphics]::FromImage($bmp)
   $graphics.CopyFromScreen($bounds.Location, [Drawing.Point]::Empty, $bounds.size)
   $bmp.Save($path)
   $graphics.Dispose()
   $bmp.Dispose()
}
$bounds = [Drawing.Rectangle]::FromLTRB(0, 0, 1000, 900)
screenshot $bounds "C:\screenshot.png"

Another idea would be to run the git-update-git-for-windows script with tracing turned on:

- run: |
    sh -x /mingw64/bin/git-update-git-for-windows

or to dig into it by debugging with tmate:

- uses: mxschmitt/action-tmate@v3
  with:
    detached: true

However, my money is on this part of the script which tries to ensure that the (interactive) Bash session is quit forcefully lest an updated MSYS2 runtime prevent an orderly exit.

Now, if that last hunch is correct, I could imagine that we need to add a special case for when $GITHUB_ACTIONS is non-empty and simply skip that code block in that scenario.

@rotu, would you like to give this a try?

rotu commented 4 weeks ago

Thanks for the tip on running under sh -x /mingw64/bin/git-update-git-for-windows -y! My bash-fu is absolute garbage, but here's what I get:

+ start '' /tmp/gfw-install-Uk97SUyn.exe //SILENT //NORESTART
+ ps
+ grep ' /usr/bin/bash$'
+ awk '{print "kill -9 " $1 ";" }'
+ sh
sh: line 1: kill: (532) - No such process
rotu commented 4 weeks ago

In another run it seems like there was a different issue and maybe that the gfw-install-td22JHkt.exe failed:

+ start '' /tmp/gfw-install-td22JHkt.exe //SILENT //NORESTART
+ ps
+ grep ' /usr/bin/bash$'
+ awk '{print "kill -9 " $1 ";" }'
+ sh

##[error]Process completed with exit code 1.
rotu commented 4 weeks ago

Progress! Your screenshot suggestion helped greatly! It seems like the installer is succeeding after some time judging from the screenshots and the output of git --version.

I still don't understand why git update-git-for-windows -y is exiting with code 1 nor why that step is ending before the installer has run to completion.

https://github.com/rotu/workflow-scratch/actions/runs/11524785608/job/32085698468

screenshot1 screenshot2

dscho commented 3 weeks ago

I still don't understand why git update-git-for-windows -y is exiting with code 1 nor why that step is ending before the installer has run to completion.

The big problem with git update-git-for-windows is that it is implemented as a Unix shell script, yet its purpose is to upgrade, potentially including the MSYS2 runtime (that is in use while running the shell script and therefore cannot be overwritten). As a work-around, the script starts the installer and immediately exits the script, potentially forcefully stopping other Bash instances (after asking the user about it).

Obviously, this is not what we would want in a GitHub workflow: We would want the workflow step to run the installer and wait for its exit status, and avoid stopping any Bash instances.

Sadly, I don't think that there is any way to keep Bash running the script while overwriting the MSYS2 runtime and then having a prayer of the script exiting in an orderly fashion. The most common symptom I saw in the past was an infinitely-repeated error message.

After thinking about this a bit more, I think that the only solution to this puzzle would be to rewrite the command in pure C, linking to libcurl to download the installer manually (and using the presence of /etc/install-options.txt as a tell-tale whether updating via the installer is applicable in the first place, excluding e.g. PortableGit-based installations).

However, this would be a rather big task.

A much more narrowly-scoped alternative (and therefore much more practical) would be to add some code guarded by if test -n "$GITHUB_ACTIONS" that verifies that it is not run from a Bash step, and then instead of starting the installer, execs it.

rotu commented 3 weeks ago

Okay, I kinda understand that. It does look like a shell script, with a #!/bin/sh shebang. BUT then there's the start line:

https://github.com/git-for-windows/build-extra/blob/532c38436d82457956144ecd55b3f5ed745b0a41/git-extra/git-update-git-for-windows#L349

  curl -# -L -o $installer $download || return
  start "" "$installer" //SILENT //NORESTART

  # Kill all Bash processes (which will let MinTTY quit, too)"
  #
  # `ps` without `-W` will automatically only catch MSYS2 processes
  # that link to *this* MSYS2 runtime, i.e. no processes from other Git
  # installations (e.g. Git for Windows' SDK) will be killed.
  ps | grep ' /usr/bin/bash$' | awk '{print "kill -9 " $1 ";" }' | sh
  return 2

That looks like the cmd.exe builtin START, and I'm guessing it's made to emulate it.

Anyhow, in my case, running which start, I see that it's a batch script at /usr/bin/start which looks like below (so I don't think it will work if no MSYS2 instance isn't in the $PATH). That said, if the idea is to spin off the executable, why wrap it in start and not just use nohup?[^1]

#!/usr/bin/env bash
# Copyright (C) 2014, Alexey Pavlov
#   mailto:alexpux@gmail.com
# This file is part of Minimal SYStem version 2.
#   https://sourceforge.net/p/msys2/wiki/MSYS2%20installation/
# File: start
"$COMSPEC" //c start "${@//&/^&}"

[^1]: EDIT: it seems like the lifetime of a command started with nohup in MSYS2 is scoped to the parent window. That is, if I run the below command in a PowerShell instance, the nohup'd command outlives git's bash.exe, but trying to exit PowerShell will hang, and killing PowerShell will kill the inner command.

    & 'C:\Program Files\Git\bin\bash.exe' -c 'nohup /c/WINDOWS/system32/bash -c "while true; do echo -n .; sleep 1; done" &'

Obviously, this is not what we would want in a GitHub workflow: We would want the workflow step to run the installer and wait for its exit status, and avoid stopping any Bash instances.

I'm not sure if you realize that bash in GitHub actions on Windows is the one shipped with Git for Windows so it's likely to have the same problem. When running on my Windows 11 desktop, however, my bash points to the one in wsl, which I think is pretty typical. I'm not sure how the shebangs and start script are expected to resolve if you run the update script from a shell other than the one distributed with Git for Windows.

that verifies that it is not run from a Bash step, and then instead of starting the installer, execs it.

What does exec do in the bash shipped with Git for Windows? Does it still hold on to the MSYS2 runtime?

dscho commented 3 weeks ago

It does look like a shell script, with a #!/bin/sh shebang.

It is a shell script.

there's the start line

Indeed. This starts the installer, but immediately returns to the shell script. The installer continues running, and the design of the script is that it exits before the installer comes even close to (potentially) overwriting the MSYS2 runtime.

That looks like the cmd.exe builtin START, and I'm guessing it's made to emulate it.

Right, it's a look-alike, living in /usr/bin/start.

I'm not sure if you realize that bash in GitHub actions on Windows is the one shipped with Git for Windows

Oh, I am well aware of it, and I tried to fight it, and gave up. I did not want the burden of having to maintain Git for Windows and in addition the Bash that runs all of the shell: bash steps in GitHub Actions jobs on Windows runners.

I'm not sure how the shebangs and start script are expected to resolve if you run the update script from a shell other than the one distributed with Git for Windows.

The shell that runs git update-git-for-windows is the /usr/bin/sh one because git.exe prepends /usr/bin to the PATH before resolving and running the git-update-git-for-windows script.

What does exec do in the bash shipped with Git for Windows?

exec does the same as the exec() syscall on Linux: It replaces the current process with a new one, in this case the installer, effectively exiting the current shell script but still having callers wait for the exit code, in this instance the exit code of the execed process, though.

Exactly what we need here.

rotu commented 3 weeks ago

It is a shell script.

I think I'm still confused by why the script kills /usr/bin/bash but not just all MSYS processes besides the upgrade script and its parents. /usr/bin/sh is even binary-identical on my machine!

Oh, I am well aware of it, and I tried to fight it, and gave up. I did not want the burden of having to maintain Git for Windows and in addition the Bash that runs all of the shell: bash steps in GitHub Actions jobs on Windows runners.

I feel you, man...

It replaces the current process with a new one, in this case the installer.

I don't get is whether any OS resources, file handles, or MSYS2 stuff depend on the process exiting in order to clean up. Or if it's going to be left in a state that conflicts with the installer.

It seems like, if we used exec in the shell script, the installer would pull the rug out from under the git.exe executable. If I'm reading the call to waitpid in try_shell_exec correctly, the git process would wait for the process to exit (regardless of whether that process is running the shell script or the installer has taken over).

After composing the previous message, I realized that Windows implicitly scopes some things to the calling window and not the process. I don't understand how that interplays with exec.

rotu commented 2 weeks ago

Per @vorporeal at https://github.com/actions/runner-images/issues/10843#issuecomment-2460055938

For folks running into this issue, we've found the following step to be an easy and reliable way to get git-for-windows updated to the latest version:

  - name: Update to latest git-for-windows
    run: |
      Start-Process git -Wait -ArgumentList 'update-git-for-windows -y'

Interesting. @dscho any insights? Does git needs to be started in a new window for update-git-for-windows to behave as expected?

dscho commented 1 week ago

Start-Process git -Wait -ArgumentList 'update-git-for-windows -y'

Does git needs to be started in a new window for update-git-for-windows to behave as expected?

No, that's not what works around the issue here. What does work around the issue here is that Start-Process returns a Process object, even if it was called with -Wait, and does not propagate the exit code of the started process as return value. Instead, the exit code is made available as a property of the Process object. If the caller would inspect that property and error out if it is non-zero, you would get the exact same behavior as if you had called git update-git-for-windows -y directly.

The only downside with this is that you will not catch any problems when the installer could not be downloaded, or when it could not successfully run to completion.

dscho commented 1 week ago

It is a shell script.

I think I'm still confused by why the script kills /usr/bin/bash but not just all MSYS processes besides the upgrade script and its parents. /usr/bin/sh is even binary-identical on my machine!

@rotu The script needs to kill its parent process. Otherwise its parent process won't be able to exit correctly if msys-2.0.dll is overwritten with a new version by the installer. The script tries to ensure that there are no other MSYS processes running, but it cannot require that its parent process isn't running ;-)

It replaces the current process with a new one, in this case the installer.

I don't get is whether any OS resources, file handles, or MSYS2 stuff depend on the process exiting in order to clean up. Or if it's going to be left in a state that conflicts with the installer.

Unlike Linux, on Windows it is not possible to overwrite files that are in use. To emulate that behavior, Cygwin's runtime (and therefore also the MSYS2 runtime, which is a friendly fork of the Cygwin one) plays nasty tricks. This is the reason why the installer can overwrite msys-2.0.dll even though a Bash is running the update-git-for-windows script.

However, this emulation is far from perfect. For example, Cygwin cannot prevent Windows' now-broken assumption of an unchanged msys-2.0.dll (which is memory-mapped), which is the reason why the parent process would crash in nasty ways when trying to exit cleanly and tear down its resources if the msys-2.0.dll file was switched out under its feet. The only way I found to make that work (kind of, at least) was to use the kill command to terminate the parent process. Which will result in a non-zero exit code, but in interactive use this does not matter because the Git Bash window (or "Run command" window) is then gone anyway, so the user won't see it.