fluxcd / flux

Successor: https://github.com/fluxcd/flux2
https://fluxcd.io
Apache License 2.0
6.9k stars 1.08k forks source link

Git cloning over SSH broken starting from version 1.25.0 #3611

Closed mskcode closed 2 years ago

mskcode commented 2 years ago

Describe the bug

Upgrading FluxCD v1 from 1.24.3 to 1.25.0 (or 1.25.1) breaks Git repository cloning over SSH (key-based authentication).

{"caller":"sync.go:54", "component":"daemon", "err":"reading the repository checkout: cloning repo: git repo not ready: git clone --mirror: fatal: Could not read from remote repository., full output:
 Cloning into bare repository '/tmp/flux-gitclone985523774'...
<redacted>@<redacted>@source.developers.google.com: Permission denied (publickey).
fatal: Could not read from remote repository.

Please make sure you have the correct access rights
and the repository exists.
", "ts":"2022-05-18T05:29:03.920326238Z", "warning":"failed to load last-synced resources. sync event may be inaccurate"}

We're using Google Cloud Source Repositories to host our code. Ingration with it has worked so far without a hitch. Upgrading the FluxCD version breaks this. The SSH key has been configured into flux-git-deploy secret resource as instructed by documentation: https://fluxcd.io/legacy/flux/guides/provide-own-ssh-key/

FluxCD arguments

          - --log-format=json
          - --manifest-generation=true
          - --memcached-hostname=memcached.flux
          - --ssh-keygen-dir=/var/fluxd/keygen
          - --sync-state=secret
          - --sync-garbage-collection=true
          - --sync-interval=5m
          - --sync-timeout=3m
          - --automation-interval=60m
          - --registry-disable-scanning=true
          - --registry-rps=20
          - --k8s-secret-name=flux-git-deploy
          - --git-url=ssh://<redacted>@<redacted>@source.developers.google.com:2022/<redacted>/gcp-deployment
          - --git-branch=master
          - --git-path=fluxcd/dev/raw,fluxcd/dev/kustomize
          - --git-poll-interval=2m
          - --git-readonly=true
          - --git-timeout=120s

Steps to reproduce

  1. Deploy version 1.25.0 or 1.25.1 of FluxCD
  2. Check the logs.

Expected behavior

Git cloning should still work.

Kubernetes version / Distro / Cloud provider

Kubernetes 1.21 / Google Cloud

Flux version

Flux v1.24.3

Git provider

Google Cloud Source Repositories

Container Registry provider

No response

Additional context

No response

Maintenance Acknowledgement

Code of Conduct

mskcode commented 2 years ago

Workaround for this issue (at least for us) is to keep using version 1.24.3.

kingdonb commented 2 years ago

Thanks for the report, and sorry for the inconvenience. We haven't had any other reports like this so far, and SSH-based cloning is definitely covered by e2e tests, moreover I have just tested it against GitHub and here there is no issue with Flux v1.25.0 on my end from that test. So it seems likely it must be something specific about the Google Cloud Source Repositories, I have never used this and it will take me some time to attempt and confirm a repro of the issue over there.

I'm not certain about what details will be important to reproduce this issue. Can you provide some information about your key format/algorithm? (Anything else unique or non-standard about your configuration?)

mskcode commented 2 years ago

The SSH key has been generated with command ssh-keygen -C "<redacted>" -b 4096 -N "" -f <file_name>

dimbleby commented 2 years ago

We are seeing the same thing trying to pull from Azure DevOps.

I believe that the root cause is the upgrade of openssh:

# ssh -V
OpenSSH_8.8p1, OpenSSL 1.1.1n  15 Mar 2022

https://www.openssh.com/txt/release-8.8 describes the change and the solution - search for "potentially incompatible changes":

Specifically if I manually add these lines to /etc/ssh/ssh_config then the pull starts to work again:

HostKeyAlgorithms +ssh-rsa
PubkeyAcceptedKeyTypes +ssh-rsa

Will submit an MR doing that, would appreciate a release containing that fix (if you are indeed happy with it)

dimbleby commented 2 years ago

eg see https://developercommunity.visualstudio.com/t/Git-SSH-access-offers-weak-algorithms-r/1547526 for a bug report against Azure Devops relating to this

kingdonb commented 2 years ago

I'm going to merge #3614

This will produce an image in the fluxcd repository, which I can post here for testing right away:

Image: docker.io/fluxcd/flux-prerelease:master-4785fbbe

If we can get confirmation that the change resolves the issue, then I'll gladly prioritize a patch release with this change in it to ensure the issue can be resolved! Thanks so much for bringing this to our attention. 🏆

kingdonb commented 2 years ago

(This will close with the release of v1.25.2)

Reopening for visibility until that is ready for users to install, since more users will be likely hitting this issue until it's out and to avoid duplicate reports from coming in.

dimbleby commented 2 years ago

confirmed that docker.io/fluxcd/flux-prerelease:master-4785fbbe succeeds in pulling from Azure DevOps, thanks!

kingdonb commented 2 years ago

Awesome, we have an image out to match now which is tagged as fluxcd/flux:1.25.2 and the Helm chart is queued up for release inside of the next few days.

kingdonb commented 2 years ago

To follow up this issue, we do not recommend using this setting and have enabled it for backwards compatibility only.

Here is an article about the issue at Microsoft's forums:

https://developercommunity.visualstudio.com/t/Git-SSH-access-offers-weak-algorithms-r/1547526