git-for-windows / git

A fork of Git containing Windows-specific patches.
http://gitforwindows.org/
Other
8.31k stars 2.52k forks source link

Regression: MinGit 2.38.0 fails to execute in Windows Nano Server #4052

Closed mthalman closed 1 year ago

mthalman commented 1 year ago

Setup

MinGit 2.38.0: https://github.com/git-for-windows/git/releases/download/v2.38.0.windows.1/MinGit-2.38.0-64-bit.zip

$ git --version --build-options

This fails to run.

Windows Nano Server container (mcr.microsoft.com/windows/nanoserver:ltsc2022-amd64)

$ cmd.exe /c ver

Microsoft Windows [Version 10.0.20348.1006]
# One of the following:
> type "C:\Program Files\Git\etc\install-options.txt"
> type "C:\Program Files (x86)\Git\etc\install-options.txt"
> type "%USERPROFILE%\AppData\Local\Programs\Git\etc\install-options.txt"
> type "$env:USERPROFILE\AppData\Local\Programs\Git\etc\install-options.txt"
$ cat /etc/install-options.txt

No options, used defaults.

This is a regression. It previously worked in version 2.37.3. There was likely a new dependency that was added to MinGit that's now causing this to fail in Nano Server because that dependency is not installed. Nano Server has a subset of the files that exist in other Windows SKUs. This issue does not occur in Windows Server Core, likely because it has the necessary dependency installed.

Here's a Dockerfile which can repro the issue:

# escape=`
FROM mcr.microsoft.com/windows/nanoserver:ltsc2022-amd64

ARG MINGIT_VERSION=2.38.0

# This version works
# ARG MINGIT_VERSION=2.37.3

RUN curl -SL --output mingit.zip https://github.com/git-for-windows/git/releases/download/v%MINGIT_VERSION%.windows.1/MinGit-%MINGIT_VERSION%-64-bit.zip `
    && mkdir "C:\Program Files\MinGit" `
    && tar -oxzf mingit.zip -C "C:\Program Files\MinGit"

USER ContainerAdministrator
RUN setx /M PATH "%PATH%;C:\Program Files\MinGit\cmd"
USER ContainerUser

RUN git version

Details

CMD

git version

Outputs the version

Execution failed with an exit code of -1073741511

n/a

dscho commented 1 year ago

Here's a Dockerfile which can repro the issue:

I cannot run Docker on my computer. Is there an easy way to reproduce this? Like, something that can be integrated into a GitHub workflow?

dscho commented 1 year ago

It might also shed some light into the issue if you could pinpoint which snapshot introduced the problem.

mthalman commented 1 year ago

I cannot run Docker on my computer. Is there an easy way to reproduce this? Like, something that can be integrated into a GitHub workflow?

No, because Nano Server is only available for containers.

It might also shed some light into the issue if you could pinpoint which snapshot introduced the problem.

I'm happy to help narrow it down to a snapshot. Can you give me a starting point?

mthalman commented 1 year ago

I narrowed it down to the snapshot for commit https://github.com/git-for-windows/git/commit/ec78604ecd06ee018364f6a7cb15e11922833382. Starting with that snapshot, this behavior begins. It works fine in the previous snapshot (commit https://github.com/git-for-windows/git/commit/c4992d4fecabd7d111726ecb37e33a3ccb51d6f1).

While diagnosing the root cause, if you need help in understanding what DLLs actually exist (or don't exist) in Nano Server, I can provide you with that info. Just give me the path that you want to know about. Unfortunately, Windows doesn't have any documentation that indicates this information.

mthalman commented 1 year ago

@dscho - Any more info I can provide here to help out?

rimrul commented 1 year ago

I narrowed it down to the snapshot for commit https://github.com/git-for-windows/git/commit/ec78604ecd06ee018364f6a7cb15e11922833382. Starting with that snapshot, this behavior begins.

That sounds like the culprit might be

e56f44a153602cf83da6cea9da0da1fd7e225c25 or 59f74e9595d02d89a6532806d53a5872d7d202bf

and psapi.dll or bcrypt.dll.

mthalman commented 1 year ago

Well, I do know that Nano Server does have bcrypt.dll installed:

> docker run --rm mcr.microsoft.com/windows/nanoserver:ltsc2022 cmd /c dir C:\Windows\System32\bcrypt.dll
 Volume in drive C has no label.
 Volume Serial Number is 881B-79E7

 Directory of C:\Windows\System32

05/08/2021  01:42 AM           169,792 bcrypt.dll
               1 File(s)        169,792 bytes
               0 Dir(s)  21,252,771,840 bytes free
mthalman commented 1 year ago

Ah, missed your update about psapi.dll. Yes, that's the one that's missing.

> docker run --rm mcr.microsoft.com/windows/nanoserver:ltsc2022 cmd /c dir C:\Windows\System32\psapi.dll
 Volume in drive C has no label.
File Not Found
 Volume Serial Number is 881B-79E7

 Directory of C:\Windows\System32
rimrul commented 1 year ago

According to https://learn.microsoft.com/en-us/windows/win32/api/psapi/nf-psapi-getprocessmemoryinfo#remarks

I think we could add a fallback to K32GetProcessMemoryInfo() which was available in nanoserver in 2017 according to https://learn.microsoft.com/en-us/previous-versions/windows/desktop/legacy/mt588480(v=vs.85)

dscho commented 1 year ago

I had a quick look at https://github.com/microsoft/mimalloc/issues?q=is%3Aissue+is%3Aopen+nano but did not find any mention of nano server...

dscho commented 1 year ago

That sounds like the culprit might be

e56f44a

The thing that I do not understand about this is: we are not requiring this DLL, we go out of our way to dynamically load this function, and if it cannot be found, we simply report bogus statistics. And we only do that if mi_process_info() is called (which is not called in general, as far as I can tell).

KevinCathcart commented 1 year ago

on Nano server, Git.exe exits with status code -1073741511 (0xC0000139) which is STATUS_ENTRYPOINT_NOT_FOUND. (Which means some function imported from a dll does not exist).

Running it under a debugger tells me that it is GetNumaProcessorNode that is missing. This being an eager loaded import makes the exe fail at startup, before any user code has run.

The irony is that this call exists only for pre windows 7 compatibility, with the code trying to use the newer GetNumaProcessorNodeEx in preference, which it does via loadlibrary and GetProcAddress.

However Nano server has dropped the older GetNumaProcessorNode and contains only the newer GetNumaProcessorNodeEx.

If Git for Windows does not support OSes that old, I'd suggest just changing the code to call GetNumaProcessorNodeEx directly. If it does support OSes that old, then calling that call dynamically instead of statically could work (if neither call could be loaded, you can just return 1, but that can't actually happen, as any windows old enough to have neither GetNumaProcessorNode nor GetNumaProcessorNodeEx would crash on some other Numa functions called statically (that are not a problem on Nano Server), and in any case would be far too old to be supported by this project..

This might not be the only call that is broken, but I don't have any good way of checking that, since the only way I could find to even determine the missing call was to copy windbg for x64 into the container (the whole C:\Program Files (x86)\Windows Kits\10\Debuggers\x64\windbg.exe folder) run gflags -i git.exe +SLS in the container as ContainerAdmin, and then call cdb.exe -c "g" -c "q" C:\mingit\mingw64\bin\git.exe.


If if github's CI runners have windbg installed (it is part of the Windows SDK, but it does not get installed as part of visual studio, while other parts of Windows SDKs are installed by it. The listing of installed software does not make it clear, and I can't check it at the moment), then it would be possible to automatically test for missing static imports (individual functions or entire DLLs) on nano-server if so desired. I could write up an explanation of how this could work later.

Hope this helps.

dscho commented 1 year ago

If Git for Windows does not support OSes that old, I'd suggest just changing the code to call GetNumaProcessorNodeEx directly. If it does support OSes that old, then calling that call dynamically instead of statically could work (if neither call could be loaded, you can just return 1, but that can't actually happen, as any windows old enough to have neither GetNumaProcessorNode nor GetNumaProcessorNodeEx would crash on some other Numa functions called statically (that are not a problem on Nano Server), and in any case would be far too old to be supported by this project..

I fear that this is a bit more complicated than your comment makes it sound.

Git for Windows vendored in mimalloc, and that's the code you're talking about. I do not want to deviate from mimalloc's code unnecessarily (ideally, not at all), so we'll have to work with the mimalloc project to find a solution for this.

The irony is that mimalloc already tries to load GetNumaProcessorNodeEx dynamically, and only uses that GetNumaProcessorNode that it imports statically as a fallback...

As a stop-gap solution, I think we should probably just turn the latter also into a dynamic import.

dscho commented 1 year ago

@KevinCathcart @mthalman once the build at https://github.com/git-for-windows/git/actions/runs/3272033990 is done, it should have an artifact with the Git executables. Could you please verify that they work in Nano Server?

dscho commented 1 year ago

it would be possible to automatically test for missing static imports (individual functions or entire DLLs) on nano-server if so desired

@KevinCathcart I would be interested in knowing details about this. How could I run Nano Server containers in a GitHub workflow? Surely I'd have to self-host a runner, right?

mthalman commented 1 year ago

@KevinCathcart @mthalman once the build at https://github.com/git-for-windows/git/actions/runs/3272033990 is done, it should have an artifact with the Git executables. Could you please verify that they work in Nano Server?

I tried running it in a mcr.microsoft.com/windows/nanoserver:ltsc2022 container and it still fails. Getting a different error code this time: -1073741515 (DLL not found)

mthalman commented 1 year ago

it would be possible to automatically test for missing static imports (individual functions or entire DLLs) on nano-server if so desired

@KevinCathcart I would be interested in knowing details about this. How could I run Nano Server containers in a GitHub workflow? Surely I'd have to self-host a runner, right?

You shouldn't need a self-hosted runner. The Windows runner images already have Docker installed. So you could just use the windows-2022 image and run the necessary Docker commands.

For my local testing, I just made a simple Dockerfile like this:

FROM mcr.microsoft.com/windows/nanoserver:ltsc2022

COPY . \git

ENTRYPOINT [ "C:\\git\\git.exe", "version" ]

To build the image you'd just run:

docker build -t git-test -f <path-to-Dockerfile> <path-to-git-binaries>

And then run the container with this command:

docker run --rm git-test

If the run command succeeds, it should spit out the version info.

KevinCathcart commented 1 year ago

@KevinCathcart @mthalman once the build at https://github.com/git-for-windows/git/actions/runs/3272033990 is done, it should have an artifact with the Git executables. Could you please verify that they work in Nano Server?

I tried running it in a mcr.microsoft.com/windows/nanoserver:ltsc2022 container and it still fails. Getting a different error code this time: -1073741515 (DLL not found)

I saw the same thing, but then saw the missing DLL was zlib1.dll, and noticed the artifacts folder was missing basically all the dlls. If I copy in the dlls from the mingw64\bin\ folder from MinGit-2.38.0-64-bit.zip then the new build linked works for me. (Although I am testing with a slightly older nanoserver, since that is what works on my computer).

@mthalman I would suggest testing like that, and if it works then that probably means the proposed change would fix it if a full MinGit zip file were constructed from it.

As for testing with CI, I'm still writing up my proposed approach. What mthalman said is correct, but that minimalistic dockerfile will only detect breakage, but not explain the cause.

KevinCathcart commented 1 year ago

it would be possible to automatically test for missing static imports (individual functions or entire DLLs) on nano-server if so desired

@KevinCathcart I would be interested in knowing details about this. How could I run Nano Server containers in a GitHub workflow? Surely I'd have to self-host a runner, right?

There is no need to self host a runner. The Windows Server GitHub hosted runners are perfectly capable of running Windows containers.

What isn't supported in running GitHub workflow steps inside a Windows container. If you had heard Windows containers are not supported, that is what they were talking about.

This means the easy option for doing testing is to simply write a Dockerfile that does the testing as RUN steps, making sure to return an error code if the testing fails.

The other important thing is that you specify the from containers tag with a specific version of nanoserver (like 2022), and specify using a hosted runner of matching version (windows-2022). One could use matrix functionality to test multiple versions if really needed, but testing against just one version would be better than no automated testing of this.

The basic idea would be create some folder in the repository to hold a Dockerfile and a script file to be copied into it, and run (since long complex RUN lines in docker files are hard to maintain, and we are not worried about optimizing layers for a throw-away image). I typically see such folders thrown inside the .github folder to keep them out of sight. I'll assume the folder name is .github/nanoserver-test

The GitHub action yaml would looks something like this (sorry, not completely tested as I wrote one step on this while on a bus.

jobs:
  test-nano-server:
    runs-on: windows-2022

    steps:

    # Check out the repository so your workflow can access it
    - uses: actions/checkout@v1

    # we copy this local so it can get included in the docker file
    - name: copy windbg local
      run: |
        xcopy "C:/Program Files (x86)/Windows Kits/10/Debuggers/x64" ./dbg64 /s /e /h

    - name: RUN nanoserver test.
      run: |
        docker build .github/nanoserver-test

Now for the testing we still want a test script, and I'm proposing using powershell core because it is easier to do this type of stuff than with a .BAT, but

Now the below Dockerfile (and powershell script) are tested, but you will probably want to use the "nanoserver-ltsc2022" tag instead, and would probably need to update the part where git files are copied in. For the powershell script the part to update should be pretty self explanatory.

FROM mcr.microsoft.com/powershell:nanoserver-1809
USER ContainerAdministrator 
# Folder previously copied from C:/Program Files (x86)/Windows Kits/10/Debuggers/x64
COPY ["dbg64", "C:/dbg"]
COPY test.ps1 C:/
## multiple alternatives to the  copy in a zip and use tar to unzip, or curl and unzip, etc.
COPY ["git.exe","*.dll", "scalar.exe", "C:/test/"]
WORKDIR C:/
RUN pwsh.exe test.ps1

Finally the powershell script:

# For each ecectuable to test pick some no-operation set of flags/subcommands
# or something that should quickly result in an error with known exit code
# that is not a negative 32 bit number, and set the expected return code appropriately.

# Only test exes that could be expected to run in a UI less environment.

# ( Excutable path, arguments, expected return code )
# also note space is required before close paren (a powershell quirk when defining nested arrrays like this)
$executables_to_test = @(
    ('C:\test\git.exe', '', 1 ),
    ('C:\test\scalar.exe', 'version', 0 )
)

foreach ($executable in $executables_to_test)
{
    Write-Output "Now testing $($executable[0])"
    &$executable[0] $executable[1]
    if ($LASTEXITCODE -ne $executable[2]) {
        # if we failed, run the debugger to find out what function or DLL could not be found
        # and then exit the script with failure The missing DLL or EXE will be referenced near
        # the end of the output

        # Set a flag to havbe debugger show loader stub dianostics.
        C:\dbg\gflags -i $executable[0] +SLS

        C:\dbg\cdb.exe -c "g" -c "q" $executable[0] $executable[1]

        exit 1
    }
}

exit 0

Please feel free to ask if you have questions. (I may be a bit slow to reply, as a close familly member has passed away). Obviously this test script won't test dlls and functions that are loaded dynamically, but hopefully most of those have fallback or are for optional features. In any case testing just the statically used dlls and functions is a great improvement over testing none at all. The weird ampersand at a start of the line is how you execute a command whose path is provided as a string in powershell. The cdb.exe options say to run the continue command at the first breakpoint, and then quit the next time it prompts for a command.

There might be an easier way to determine the missing function or dll than using a debugger, but I could not find a simpler approach, and it is not that hard.

Hope this helps

mthalman commented 1 year ago

@mthalman I would suggest testing like that, and if it works then that probably means the proposed change would fix it if a full MinGit zip file were constructed from it.

Yep, confirmed copying in the DLLs allows the scenario to work for me.

mthalman commented 1 year ago

Will a fix for this be released soon? Nano Server systems need to be pinned to 2.37.3 which is the latest working version and is missing out on security updates.

dscho commented 1 year ago
        # if we failed, run the debugger to find out what function or DLL could not be found
        # and then exit the script with failure The missing DLL or EXE will be referenced near
        # the end of the output

        # Set a flag to havbe debugger show loader stub dianostics.
        C:\dbg\gflags -i $executable[0] +SLS

        C:\dbg\cdb.exe -c "g" -c "q" $executable[0] $executable[1]

        exit 1

@KevinCathcart thank you so much! However, I fail to find the missing DLL or EXE near the end of the output: https://github.com/dscho/git-sdk-64/actions/runs/3380747820/jobs/5613880638#step:3:56

Ideally, we would have a WinDBG script that would identify the missing symbol. I am woefully unexperienced with WinDBG; Would you happen to know any way to coax cdb.exe into giving us this information?

dscho commented 1 year ago

I fail to find the missing DLL or EXE near the end of the output: https://github.com/dscho/git-sdk-64/actions/runs/3380747820/jobs/5613880638#step:3:56

Ooooh, oooh, if I pass --user "NT Authority\System", it works!

dscho commented 1 year ago

Will a fix for this be released soon?

@mthalman I had not planned on fast-tracking a new Git for Windows version "just" for Nano Server's benefit. As mentioned by the Git maintainer, a Git v2.38.2 "may or may not" happen before v2.39.0, which is scheduled for December 12th.

As soon as #4074 is merged, however, there will be a new Git for Windows snapshot that has the fix.

KevinCathcart commented 1 year ago

I fail to find the missing DLL or EXE near the end of the output: https://github.com/dscho/git-sdk-64/actions/runs/3380747820/jobs/5613880638#step:3:56

Ooooh, oooh, if I pass --user "NT Authority\System", it works!

Yeah, the default user in the container has no admin rights, and annoyingly the gflags executable will silently fail to set flags if you don't have admin rights. Technically that is the only step that requires them, but Windows' tools for changing users mid script are lackluster compared to Linux.

"NT Authority\System" is probably overkill, as the user called "ContainerAdministrator" is supposed to exist and have admin rights (but I don't actually have the ability to verify this for the recent LTSC container images).

viceice commented 1 year ago

ContainerAdministrator is available on all ms windows container images

dscho commented 1 year ago

ContainerAdministrator is available on all ms windows container images

And how do you specify that in a docker run invocation?

mthalman commented 1 year ago

ContainerAdministrator is available on all ms windows container images

And how do you specify that in a docker run invocation?

docker run --user ContainerAdministrator ...

dscho commented 1 year ago

ContainerAdministrator is available on all ms windows container images

And how do you specify that in a docker run invocation?

docker run --user ContainerAdministrator ...

So the remaining question for me is: what does it buy me to use ContainerAdministrator over using NT Authority\System?

dscho commented 1 year ago

@KevinCathcart @mthalman @viceice please test Git for Windows v2.39.0-rc0. Testing these pre-releases are the best way to ensure that your workflows aren't broken by any Git for Windows updates, and as a bonus testing these also helps other users.

viceice commented 1 year ago

for me it seems v2.38.1.windows.1 already fixed the issue, as it sucessfully automergd by renovate 🎉

Cube707 commented 1 year ago

just spend two days finding this 😢 can confirm it's fixed in 2.39.0-rc0

mthalman commented 1 year ago

@dscho - Confirmed that 2.39.0-rc0 fixes the issue. 👍 Thanks a bunch!

dscho commented 1 year ago

For the record, I opened https://github.com/microsoft/mimalloc/pull/654 with this fix (and other fixes I had to implement to integrate mimalloc into Git for Windows). Once that is merged, and a new version is tagged, I can integrate a new version into Git for Windows and then all these references that clutter up this here ticket will hopefully stahp.

mthalman commented 1 year ago

Great to see this fixed in the 2.39.0 release. Thanks!