git-for-windows / git

A fork of Git containing Windows-specific patches.
http://gitforwindows.org/
Other
8.38k stars 2.54k forks source link

SSH packaged with git can't find home directory for known_hosts #5091

Open lambda opened 3 months ago

lambda commented 3 months ago

Setup

PS C:\> git --version --build-options
git version 2.46.0.windows.1
cpu: x86_64
built from commit: 2e6a859ffc0471f60f79c1256f766042b0d5d17d
sizeof-long: 4
sizeof-size_t: 8
shell-path: D:/git-sdk-64-build-installers/usr/bin/sh
feature: fsmonitor--daemon
libcurl: 8.9.0
OpenSSL: OpenSSL 3.2.2 4 Jun 2024
zlib: 1.3.1
PS C:\> cmd.exe /c ver

Microsoft Windows [Version 10.0.17763.5936]

Cannot find install-options.txt. Used scoop install git with no special options.

I'm running in a Windows Docker container; this is for a CI job. The docker container is based on mcr.microsoft.com/windows/servercore:ltsc2019 with Python and a few other useful tools installed, including Git. We use scoop as a package manager since it makes it simpler to install several things in a consistent way, so we're installing Git via scoop.

Details

Powershell.

There's some more in there to set up SSH keys and so on, but this is enough to reproduce the issue because you don't even get to the key exchange if you can't verify the host key. $SSH_KNOWN_HOSTS is a variable containing a known host file with our git server, which we populate in advance so we can clone in CI without having to ignore the host key. In this example I just populate it from github.com for the sake of a minimal complete verifiable example.

$SSH_KNOWN_HOSTS=$(ssh-keyscan github.com)
mkdir ~/.ssh
$SSH_KNOWN_HOSTS | Out-File -Encoding ASCII ~/.ssh/known_hosts
git clone git@github.com:git/git.git

Successfully clone the repo (or, fail due to no key set up if you only do exactly the above in a clean environment; as mentioned we also setup our private keys and config in the real CI job)

Cloning into 'git'...
Host key verification failed.
fatal: Could not read from remote repository.
Please make sure you have the correct access rights
and the repository exists.

Note that if you do this interactively, you can accept the host key and move on. But we're doing this in CI, we need it to find the correct known hosts, keys, and other config that's been installed for it.

Happens with any repo, just need to install your known hosts file as above (in our actual CI the hosts file contents are predefined, but the above example uses ssh-keyscan to pull them for the sake of example).

This worked in Git 2.45.2.

If I dig in further, I find that just using the SSH from bash it fails to use the user's home directory correctly.

In the environment with Git 2.46.0:

$ git --version
git version 2.46.0.windows.1
$ bash -c "echo ~"
/c/Users/ContainerAdministrator
$ bash -c 'echo "$HOME"'
/c/Users/ContainerAdministrator
$ bash -c "ls ~/.ssh"
config
id_rsa
id_rsa_gitlab
known_hosts
$ bash -c "ssh -vvv -T git@gitlab.<redacted>"
OpenSSH_9.8p1, OpenSSL 3.2.2 4 Jun 2024
debug1: Reading configuration data /etc/ssh/ssh_config
debug3: expanded UserKnownHostsFile '~/.ssh/known_hosts' -> '/.ssh/known_hosts'
debug3: expanded UserKnownHostsFile '~/.ssh/known_hosts2' -> '/.ssh/known_hosts2'

In the environment with Git 2.45.2:

$ git --version
git version 2.45.2.windows.1
$ bash -c "echo ~"
/c/Users/ContainerAdministrator
$ bash -c 'echo "$HOME"'
/c/Users/ContainerAdministrator
$ bash -c "ls ~/.ssh"
config
id_rsa
id_rsa_gitlab
known_hosts
$ bash -c "ssh -vvv -T git@gitlab.<redacted>"
OpenSSH_9.7p1, OpenSSL 3.2.1 30 Jan 2024
debug1: Reading configuration data /c/Users/ContainerAdministrator/.ssh/config
debug1: /c/Users/ContainerAdministrator/.ssh/config line 3: Applying options for gitlab.<redacted>
debug1: Reading configuration data /etc/ssh/ssh_config
debug3: expanded UserKnownHostsFile '~/.ssh/known_hosts' -> '/c/Users/ContainerAdministrator/.ssh/known_hosts'
debug3: expanded UserKnownHostsFile '~/.ssh/known_hosts2' -> '/c/Users/ContainerAdministrator/.ssh/known_hosts2'

As we can see, on the newer version of SSH shipped with the newer version of Git, it's not looking in the home directory at all. But bash itself is able to expand ~ correctly, and translate the $HOME environment variable.

plain5 commented 3 months ago
Matthias-Roth-SWE commented 3 months ago

i have a similar issue. I get the following error

tilde_expand: No such user \\.ssh\\my_private_ssh_key

ssh-login on the same server works with the key-file in C:\Users\MyUser\.ssh\my_private_ssh_key

dscho commented 2 months ago

shell-path: D:/git-sdk-64-build-installers/usr/bin/sh

That is odd. This is not where Git for Windows' installer installs things.

Cannot find install-options.txt. Used scoop install git with no special options.

Aha! Does it work if you do not use scoop? I am not sure what scoop does, and if it does something "funny" then that's something for them to fix, there's nothing I can do over here.

For the record, I cannot reproduce the issue at all. When I invoke bash from PowerShell as described in the report, it invokes WSL, actually, and everything works.

When I invoke bash with the full path to C:\Program Files\Git\usr\bin\bash.exe, it works. When I invoke ssh with the full path to C:\Program Files\Git\usr\bin\ssh.exe (not recommended, as the path usr\bin is considered internal to Git, i.e. an implementation detail that can change without any prior notice), it works.

vermiculus commented 2 months ago

We can reproduce this issue as well, but we are not using scoop – we're just using the installer linked from this project's releases. We are not using WSL for this – we're just using Git Bash for Windows (though I suspect the problem is at a deeper layer than Git Bash itself – point is that WSL isn't even installed on these machines where we can reproduce).

We found this in our automated testing setup within a Windows Docker container (running on an on-prem GitLab instance); if it'll be helpful for you, we can provide a Dockerfile within which you should be able to reproduce this issue.


Edit -- it's worth noting that in our case, OpenSSH isn't able to find ~/.ssh/config, but this would almost certainly be the same root cause.

bradpols commented 2 months ago

... if it'll be helpful for you, we can provide a Dockerfile within which you should be able to reproduce this issue.

Here's the Dockerfile: Dockerfile.txt Rename it to Dockerfile - I had to add a file extension to attach it here. You'll need to place a valid GitHub SSH private key in a file named "my-ssh-private-key" next to the Dockerfile before building.

Instructions for using (for folks who may not be familiar with docker):

  1. Create an empty directory (e.g., C:\temp)
  2. Place the Dockerfile and a GitHub SSH key file (named "my-ssh-private-key") in the new directory.
  3. Open Powershell as an Administrator and cd to the new directory.
  4. docker build -t ssh_no_work:v1 .
  5. docker run ssh_no_work:v1
  6. You should see output like this:
    Warning: Permanently added 'github.com' (ED25519) to the list of known hosts.
    git@github.com: Permission denied (publickey).
  7. For testing interactively inside the container, you can do something like this: docker run -it --entrypoint powershell.exe ssh_no_work:v1

If you add a -v to the ssh call, you can see that it does not check the user's home directory for keys, and if you had a config file in ~/.ssh it doesn't check for that either.

OpenSSH_9.8p1, OpenSSL 3.2.2 4 Jun 2024
debug1: Reading configuration data /etc/ssh/ssh_config
debug1: Connecting to github.com [140.82.113.4] port 22.
debug1: Connection established.
debug1: identity file /.ssh/id_rsa type -1
debug1: identity file /.ssh/id_rsa-cert type -1
debug1: identity file /.ssh/id_ecdsa type -1
debug1: identity file /.ssh/id_ecdsa-cert type -1
debug1: identity file /.ssh/id_ecdsa_sk type -1
debug1: identity file /.ssh/id_ecdsa_sk-cert type -1
debug1: identity file /.ssh/id_ed25519 type -1
debug1: identity file /.ssh/id_ed25519-cert type -1
debug1: identity file /.ssh/id_ed25519_sk type -1
debug1: identity file /.ssh/id_ed25519_sk-cert type -1
debug1: identity file /.ssh/id_xmss type -1
debug1: identity file /.ssh/id_xmss-cert type -1
...
git@github.com: Permission denied (publickey).

Change the Dockerfile to use Git 2.45.2 instead of 2.46.0, and ssh will work fine.

dscho commented 2 months ago

Hrm. For various reasons, my Docker setup (which is not a light-weight requirement for that minimal reproducer, by the way) is configured to run Linux images, and has to remain so, therefore I cannot test:

 - InvalidBaseImagePlatform: Base image mcr.microsoft.com/dotnet/sdk:8.0.302-1-windowsservercore-ltsc2019 was pulled with platform "windows/amd64", expected "linux/amd64" for current build (line 3)

Maybe you can run strace -o <log-fie> ssh [...] instead of ssh [...] and then study what the log file has to say about HOME (which should be set to /c/Users/ContainerAdministrator, but apparently is not)?

bradpols commented 2 months ago

When comparing the strace log file when ssh doesn't work (Git 2.46.0) versus when it does (Git 2.45.2), HOME is defined as /c/Users/ContainerAdministrator in both. But on line 466 of strace_nowork.log, it's (incorrectly) trying to open //.ssh/config, while on the corresponding line 468 in strace_works.log it's (correctly) trying to open /c/Users/ContainerAdministrator/.ssh/config.

dscho commented 1 month ago

Could you please test with Git for Windows v2.47.0-rc0? It comes with a new MSYS2 runtime version that I hope fixes this.

bradpols commented 1 month ago

Git for Windows v2.47.0-rc0 doesn't seem to help - it carries forward the same problem (never looks for config/keys in ~/.ssh).

$ ssh -v -o StrictHostKeyChecking=accept-new -T git@github.com
OpenSSH_9.9p1, OpenSSL 3.2.3 3 Sep 2024
debug1: Reading configuration data /etc/ssh/ssh_config
...
debug1: No more authentication methods to try.
git@github.com: Permission denied (publickey).
dscho commented 1 month ago

Hrm. So now comes the tedious part: debugging the issue. For that, we will have to come up with an easier way to reproduce. I have a couple of ideas how to do that:

Once we have that, the really tedious part begins: patching the MSYS2 runtime and/or OpenSSH to figure out where things go wrong, maybe even interactive debugging with gdb (using action-tmate after rebuilding ssh with debug information).

I had really hoped that we could avoid that.

bradpols commented 3 weeks ago

I was able to reproduce by invoking ssh-keygen and observing the default key location at the initial prompt. I did not find a way to use ssh -Q to demonstrate the problem.

Created a workflow that demonstrates the problem in this run: https://github.com/bradpols/git-for-windows/actions/runs/11265488718/job/31327391490