fluxcd / source-controller

The GitOps Toolkit source management component
https://fluxcd.io
Apache License 2.0
239 stars 187 forks source link

[GitHub] Handshake failed: knownhosts: key mismatch #490

Open pkit opened 2 years ago

pkit commented 2 years ago

Started getting these errors out of the blue on all clusters.

{"level":"error","ts":"2021-11-16T18:21:07.474Z","logger":"controller.gitrepository","msg":"Reconciler error","reconciler group":"source.toolkit.fluxcd.io","reconciler kind":"GitRepository","name":"flux-system","namespace":"flux-system","error":"unable to clone 'ssh://git@github.com/user/repository', error: ssh: handshake failed: knownhosts: key mismatch"}

Doing find -name known_hosts in the pod produces nothing. Restarting the pod = same error immediately. What's going on, where's the known_hosts file?

stefanprodan commented 2 years ago

What's going on, where's the known_hosts file?

The known_hosts file is in the same secret as the SSH key, please see the docs here https://fluxcd.io/docs/components/source/gitrepositories/#ssh-authentication

stefanprodan commented 2 years ago

I'm getting the same error on my cluster:

✗ GitRepository reconciliation failed: 'unable to clone 'ssh://git@github.com/stefanprodan/my-demo-fleet': ssh: handshake failed: knownhosts: key mismatch'

Looks like an issue with GitHub host keys.

kmannuz commented 2 years ago

I am also seeing this error in the last 30 minutes on 3 clusters that had been previously working fine

kingdonb commented 2 years ago

According to: https://github.blog/2021-09-01-improving-git-protocol-security-github/

Today is the day that host keys get rotated at GitHub. There are two new host keys in the blog post, one for ECDSA and another for Ed25519.

stefanprodan commented 2 years ago

Ok so rotating the SSH key fixes it.

Before:

$ k -n flux-system get secret flux-system -o json | jq '.data | map_values(@base64d)'
{
  "identity": "-----BEGIN PRIVATE KEY-----\n",
  "identity.pub": "ecdsa-sha2-nistp384 \n",
  "known_hosts": "github.com ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEAq2A7hRGmdnm9tUDbO9IDSwBK6TbQa+PXYPCPy6rbTrTtw7PHkccKrpp0yVhp5HdEIcKr6pLlVDBfOLX9QUsyCOV0wzfjIJNlGEYsdlLJizHhbn2mUjvSAHQqZETYP81eFzLQNnPHt4EVVUh7VfDESU84KezmD5QlWpXLmvU31/yMf+Se8xhHTvKSCZIFImWwoG6mbUoWf9nzpIoaSjB+weqqUUmpaaasXVal72J+UX2B+2RPW3RcT0eOzQgqlJL3RKrTJvdsjE3JEAvGq3lGHSZXy28G3skua2SmVi/w4yCE6gbODqnTWlg7+wC604ydGXA8VJiS5ap43JXiUFFAaQ=="
}

After:

{
  "identity": "-----BEGIN PRIVATE KEY-----\n",
  "identity.pub": "ecdsa-sha2-nistp384 \n",
  "known_hosts": "github.com ecdsa-sha2-nistp256 AAAAE2VjZHNhLXNoYTItbmlzdHAyNTYAAAAIbmlzdHAyNTYAAABBBEmKSENjQEezOmxkZMy7opKgwFB9nkt5YRrYMjNuG5N87uRgg6CLrbo5wAdT/y6v0mKV0U2w0WZ2YB/++Tpockg="
}
pkit commented 2 years ago

The known_hosts file is in the same secret as the SSH key, please see the docs here https://fluxcd.io/docs/components/source/gitrepositories/#ssh-authentication

Cool, thanks, but I do see the "old" keys when doing keyscan on the nodes. Somehow only the pods see the "new" ones. It makes sense though.

stefanprodan commented 2 years ago

GitHub has changed its SSH host keys from DSA to ECDSA! https://github.blog/2021-09-01-improving-git-protocol-security-github/

To fix the key mismatch error, you have two options:

Update the known_hosts in the flux-system secret with the ecdsa-sha2-nistp25 value:

github.com ecdsa-sha2-nistp256 AAAAE2VjZHNhLXNoYTItbmlzdHAyNTYAAAAIbmlzdHAyNTYAAABBBEmKSENjQEezOmxkZMy7opKgwFB9nkt5YRrYMjNuG5N87uRgg6CLrbo5wAdT/y6v0mKV0U2w0WZ2YB/++Tpockg=

Or rotate the SSH keys with flux boostrap like so:

pkit commented 2 years ago

Updated known_hosts in flux-system secret manually everywhere. Seems to work now.

seh commented 2 years ago

If you'd like a short program to do it:

#!/usr/bin/env bash

set -e -u -o pipefail

# NB: The Ed25519-format key does not work with Flux.
for secret_name in flux-system repo-2 repo-3; do
  kubectl --namespace=flux-system \
          patch secret "${secret_name}" \
          --patch='
stringData:
  known_hosts: >
    github.com ecdsa-sha2-nistp256 AAAAE2VjZHNhLXNoYTItbmlzdHAyNTYAAAAIbmlzdHAyNTYAAABBBEmKSENjQEezOmxkZMy7opKgwFB9nkt5YRrYMjNuG5N87uRgg6CLrbo5wAdT/y6v0mKV0U2w0WZ2YB/++Tpockg='
done

kubectl --namespace=flux-system rollout restart deployment source-controller
kubectl --namespace=flux-system rollout status deployment/source-controller --watch
brianpham commented 2 years ago

Confirmed. Working for us now as well after deleting the secret and bootstrapping again.

stefanprodan commented 2 years ago

@seh the secret is not mounted inside source-controller, instead the controller reads the secret from Kubernetes API before each Git operations. I don't think you need rollout restart.

seh commented 2 years ago

I was finding that it sits in what appears to be due to a backed-off timer, such that it won't try again for a while after several consecutive failures, but restarting it caused it to try again immediately.

ellieayla commented 2 years ago

Variant on the above script: https://gist.github.com/ellieayla/76352313c4f5939db6d2268fb70b0d48

Then either wait or request each GitRepository to reconcile.

poteat commented 2 years ago

Confirm that we are getting this on our cluster as well suddenly.

ellieayla commented 2 years ago

Note with libgit2, the reported error is unable to clone: Certificate ala fluxcd/source-controller#397 and fluxcd/source-controller#433.

ghost commented 2 years ago

@stefanprodan maybe add to the comment that if you edit the secrets manually, you should restart the source-controller after updating the secret, otherwise source-controller might overwrite the secret with the old values.

We've stopped the source-controller before updating the secrets and then started it again just to be safe:

kubectl scale deploy/source-controller --replicas=0

update the secrets

kubectl scale deploy/source-controller --replicas=1

Edit: the old ssh-rsa value gets added back somehow. Maybe kustomize-controller also needs to be restarted.

stefanprodan commented 2 years ago

otherwise source-controller might overwrite the secret with the old values.

source-controller doesn't alter secrets. It can't even do that, our RBAC allows the controller read-only access to secrets.

stefanprodan commented 2 years ago

Edit: the old ssh-rsa value gets added back somehow. Maybe kustomize-controller also needs to be restarted.

You clearly don't use bootstrap or you've stored the SSH keys in Git. If so, then update the secret in Git as well.

rtjfarrimond commented 2 years ago

Unfortunately, this was a predictable incident. It felt wrong to me, as a Flux user, to be providing a known hosts entry as part of the terraform bootstrap process (from this example) for precisely this reason.

To prevent another incident of similar scale in the future, why not give the source-controller the responsibility of maintaining the known hosts file? Presumably given the urls of the sources it has to reconcile it should be fairly straight forward to use something like ssh-keyscan to keep the file up to date?

stefanprodan commented 2 years ago

It felt wrong to me, as a Flux user, to be providing a known hosts entry as part of the bootstrap process for precisely this reason.

Bootstrap does no such thing, Flux itself generates the known_hosts entries. As a Flux user, you are never asked to provide host keys.

sebastian-dyroff commented 2 years ago

Are multiple known_hosts with different algorithms supported by the go-git implementation?

rtjfarrimond commented 2 years ago

Bootstrap does no such thing, Flux itself generates the known_hosts entries. As a Flux user, you are never asked to provide host keys. @stefanprodan this example from the flux terraform provider examples certainly does.

stefanprodan commented 2 years ago

@rtjfarrimond I was referring to flux bootstrap not Terraform.

rtjfarrimond commented 2 years ago

I understand, but to be clear, in my original comment I was referring to the terraform bootstrap process. Updated the original comment to reflect this.

hiddeco commented 2 years ago

To prevent another incident of similar scale in the future, why not give the source-controller the responsibility of maintaining the known hosts file?

How can a known_hosts file, that is used as a trust storage, be automatically maintained by a service? That would render the known_hosts useless and allow any MITM-attacks to happen.

ghost commented 2 years ago

We have two git sources, flux-system and flux-manifests. We've updated the known_hosts for both but for flux-manifests the known_hosts keeps getting replaced with the ssh-rsa key:

{
  "level": "debug",
  "ts": "2021-11-17T10:28:10.304Z",
  "logger": "events",
  "msg": "Normal",
  "object": {
    "kind": "Kustomization",
    "namespace": "flux-system",
    "name": "flux-system",
    "uid": "138b16f7-ca30-458e-a0b1-811b2900fa2c",
    "apiVersion": "kustomize.toolkit.fluxcd.io/v1beta2",
    "resourceVersion": "189896097"
  },
  "reason": "info",
  "message": "Secret/flux-system/flux-manifests configured"
}

Is known_hosts getting updated by the libgit2 callback ?

ghost commented 2 years ago

Sorry, my bad. It looks like we have the secrets for flux-manifests in Git and flux is just reconciling the secrets.

hiddeco commented 2 years ago

The Secret files are not managed or written to by any of the controllers, but only used for read operations. If something is overwriting your Secret, it must come from something within your configuration.

rtjfarrimond commented 2 years ago

How can a known_hosts file, that is used as a trust storage, be automatically maintained by a service? That would render the known_hosts useless and allow any MITM-attack to happen.

If the some process were to update the known_hosts runs on the same box with the same user that uses the known_hosts file, where would the vector for a MITM be?

hiddeco commented 2 years ago

By it automatically accepting the offered keys.

If your network is compromised and hostname.com suddenly starts serving traffic from compromised.com with a different host key, which is then automatically excepted by the controller, checking the host key no longer has any value.

rtjfarrimond commented 2 years ago

If your network is compromised and hostname.com suddenly starts serving traffic from compromised.com with a different host key, which is then automatically excepted by the controller, checking the host key no longer has any value.

Yep, that makes sense, I withdraw my bad idea! Thanks :)

rtjfarrimond commented 2 years ago

@stefanprodan Here is a PR to update the known_hosts in the terraform example I linked earlier.

seh commented 2 years ago

Two things lengthened my fixing of this problem across ~20 clusters:

I had to patch the top-level Kustomization to set "spec.wait" to false, then force Flux to reconcile it. It took many tries before the health checking timeouts expired and Flux finally both updated and then started using the new Secret "data.known_hosts" field value.

devozs commented 2 years ago

GitHub has changed its SSH host keys from DSA to ECDSA! https://github.blog/2021-09-01-improving-git-protocol-security-github/

To fix the key mismatch error, you have two options:

Update the known_hosts in the flux-system secret with the ecdsa-sha2-nistp25 value:

github.com ecdsa-sha2-nistp256 AAAAE2VjZHNhLXNoYTItbmlzdHAyNTYAAAAIbmlzdHAyNTYAAABBBEmKSENjQEezOmxkZMy7opKgwFB9nkt5YRrYMjNuG5N87uRgg6CLrbo5wAdT/y6v0mKV0U2w0WZ2YB/++Tpockg=

Or rotate the SSH keys with flux boostrap like so:

  • delete the deploy key secret from your cluster kubectl -n flux-system delete secret flux-system
  • rerun flux bootstrap github with the same arguments as before
  • Flux will generate the secret with ecdsa-sha2 SSH key and Host key

Thanks for the suggestion, in my case i also had to:

gautamr commented 2 years ago

GitHub has changed its SSH host keys from DSA to ECDSA! https://github.blog/2021-09-01-improving-git-protocol-security-github/

To fix the key mismatch error, you have two options:

Update the known_hosts in the flux-system secret with the ecdsa-sha2-nistp25 value:

github.com ecdsa-sha2-nistp256 AAAAE2VjZHNhLXNoYTItbmlzdHAyNTYAAAAIbmlzdHAyNTYAAABBBEmKSENjQEezOmxkZMy7opKgwFB9nkt5YRrYMjNuG5N87uRgg6CLrbo5wAdT/y6v0mKV0U2w0WZ2YB/++Tpockg=

Or rotate the SSH keys with flux boostrap like so:

  • delete the deploy key secret from your cluster kubectl -n flux-system delete secret flux-system
  • rerun flux bootstrap github with the same arguments as before
  • Flux will generate the secret with ecdsa-sha2 SSH key and Host key

worked for us

oscaromeu commented 2 years ago

GitHub has changed its SSH host keys from DSA to ECDSA! https://github.blog/2021-09-01-improving-git-protocol-security-github/

Or rotate the SSH keys with flux boostrap like so:

  • delete the deploy key secret from your cluster kubectl -n flux-system delete secret flux-system
  • rerun flux bootstrap github with the same arguments as before
  • Flux will generate the secret with ecdsa-sha2 SSH key and Host key

Worked for me as well, thanks! :dancers:

cbyad commented 2 years ago

GitHub has changed its SSH host keys from DSA to ECDSA! https://github.blog/2021-09-01-improving-git-protocol-security-github/ To fix the key mismatch error, you have two options: Update the known_hosts in the flux-system secret with the ecdsa-sha2-nistp25 value:

github.com ecdsa-sha2-nistp256 AAAAE2VjZHNhLXNoYTItbmlzdHAyNTYAAAAIbmlzdHAyNTYAAABBBEmKSENjQEezOmxkZMy7opKgwFB9nkt5YRrYMjNuG5N87uRgg6CLrbo5wAdT/y6v0mKV0U2w0WZ2YB/++Tpockg=

Or rotate the SSH keys with flux boostrap like so:

  • delete the deploy key secret from your cluster kubectl -n flux-system delete secret flux-system
  • rerun flux bootstrap github with the same arguments as before
  • Flux will generate the secret with ecdsa-sha2 SSH key and Host key

worked for us too Thks!

kaaboaye commented 2 years ago

In my case bootstrap fails to create new secret

flux bootstrap github --owner=USER --repository=REPO --branch=flux2 --personal --path=clusters/CLUSTER --components-extra=image-reflector-controller,image-automation-controller

► connecting to github.com
► cloning branch "flux2" from Git repository "https://github.com/USER/REPO.git"
✔ cloned repository
► generating component manifests
✔ generated component manifests
✔ component manifests are up to date
✔ reconciled components
► determining if source secret "flux-system/flux-system" exists
► generating source secret
✔ public key: ecdsa-sha2-nistp384 key 
✗ multiple errors occurred: 
- POST https://api.github.com/repos/USER/REPO/keys: 404 Not Found []
- the requested resource was not found

Switching from ssh to https helped

stefanprodan commented 2 years ago

@kaaboaye your user token doesn’t have permission to create deploy keys, you need to be a repo admin.

ninja9k1 commented 2 years ago

I am having a very similar, if not the same, error while setting up gitops on my local kind cluster following this tutorial: https://docs.gitops.weave.works/docs/getting-started/

{"level":"error","ts":"2021-11-16T18:21:07.474Z","logger":"controller.gitrepository","msg":"Reconciler error","reconciler group":"source.toolkit.fluxcd.io","reconciler kind":"GitRepository","name":"flux-system","namespace":"flux-system","error":"unable to clone 'ssh://git@github.com/user/repository', error: ssh: handshake failed: knownhosts: key mismatch"}

This is a brand new instantiation which I have just fired up a few minutes ago as of this writing. kubectl -n flux-system delete secret flux-system does not work as this is not done through flux bootstrap. Any ideas?

sbernheim commented 2 years ago

I am having a very similar, if not the same, error while setting up gitops on my local kind cluster following this tutorial: https://docs.gitops.weave.works/docs/getting-started/

{"level":"error","ts":"2021-11-16T18:21:07.474Z","logger":"controller.gitrepository","msg":"Reconciler error","reconciler group":"source.toolkit.fluxcd.io","reconciler kind":"GitRepository","name":"flux-system","namespace":"flux-system","error":"unable to clone 'ssh://git@github.com/user/repository', error: ssh: handshake failed: knownhosts: key mismatch"}

This is a brand new instantiation which I have just fired up a few minutes ago as of this writing. kubectl -n flux-system delete secret flux-system does not work as this is not done through flux bootstrap. Any ideas?

@ninja9k1 - I assume by now that you've resolved this issue for your local gitops installation, but I'll add a response to this Issue in case anyone else finds it and needs the same solution.

The gitops CLI uses your local user's ~/.ssh/known_hosts file as the source for this key, and this error generally means that you need to remove the old RSA host key and add the new ECDSA host key in that file.

This command should remove the existing key:

ssh-keygen -R github.com

You can then either use this command to insert the new key without actually trying to SSH to GitHub:

ssh-keyscan -t ecdsa github.com >> ~/.ssh/known_hosts

Or start an SSH connection to github.com and let GitHub disconnect you after the connection succeeds:

ssh git@github.com
olivercp3 commented 1 year ago

An error is still reported ( Handshake failed: knownhosts: key mismatch ) when a new ecdsa hostkey is generated my bootstrap command is: flux bootstrap git --url=ssh://git@XXX.com/DP/k8s-deploy-2.git --private-key-file=/root/.ssh/id_ecdsa --branch dev Secret generated by bootstrap, why knownhosts: key still mismatch

braadaaay commented 9 months ago

I managed to get SFTP working, see here on #2948