argoproj / argo-cd

Declarative Continuous Deployment for Kubernetes
https://argo-cd.readthedocs.io
Apache License 2.0
16.33k stars 4.93k forks source link

ArgoCD Repo Server stops pulling git repositories due to Azure Devops Repos current sunset SSH-RSA strategy #17634

Closed deB4SH closed 2 weeks ago

deB4SH commented 1 month ago

Hi all, Microsoft published a blogpost Feb 15th 2024 to sunset the ssh-rsa support and wants to migrate to rsa-sha2-256/512. Blog Post: https://devblogs.microsoft.com/devops/ssh-rsa-deprecation/

Based on their schedule everyone that still uses their service should be in Phase 2 where an throttling/delay is in place and an error is shown with following message.

“ssh-rsa is about to be deprecated and your request has been throttled. Please use rsa-sha2-256 or rsa-sha2-512 instead. Your session will continue automatically. For more details see https://devblogs.microsoft.com/devops/ssh-rsa-deprecation.”

This error is also shown within argo. grafik

After searching a bit - it seems like that golang/crypto already support rsa-sha2-256/512 but sadly starting from v0.21.0. (https://github.com/golang/crypto/commit/6fad3dfc)

Argo seems to use v0.19.0 https://github.com/argoproj/argo-cd/blob/1bddee2e5dfff35613847eef9a2c0e6818976dc3/go.mod#L85

Also found a relevant issue in this regard: https://github.com/argoproj/argo-cd/issues/7600

Checklist:

Describe the bug Argo is currently unable to pull git repositories provided by azure devops repos and stops after receives the delay error.

To Reproduce Pull a repository from their service with a ssh key.

Expected behavior Argo pulls changes or state from git repository.

Version

argocd@argocd-core-server-588df95858-5jcc7:~$ argocd version
argocd: v2.9.3+6eba5be
  BuildDate: 2023-12-01T23:05:50Z
  GitCommit: 6eba5be864b7e031871ed7698f5233336dfe75c7
  GitTreeState: clean
  GoVersion: go1.21.3
  Compiler: gc
  Platform: linux/amd64

Logs

Logs from repo-server:

Stream closed EOF for argocd/argocd-core-repo-server-6b48b5bc4b-t5xfj (copyutil)
repo-server time="2024-03-27T06:40:28Z" level=info msg="finished unary call with code OK" grpc.code=OK grpc.method=Check grpc.service=grpc.health.v1.Health grpc.start_time="2024-03-27T06:40:28Z" grpc.time_ms=0.019 span.kind=server system=grpc
repo-server time="2024-03-27T06:40:34Z" level=error msg="finished unary call with code Unknown" error="unknown error: remote: Command git-upload-pack: ssh-rsa is about to be deprecated and your request has been throttled. Please use rsa-sha2-256 or rsa-sha2-512 instead. Your session will continue automatically. For more details see https://aka.ms/ado-ssh-rsa-deprecation." grpc.code=Unknown grpc.method=GenerateManifest grpc.service=repository.RepoServerService grpc.start_time="2024-03-27T06:40:29Z" grpc.time_ms=5616.032 span.kind=server system=grpc
sergeibelov113 commented 3 weeks ago

Hey folks, I think this change needs to be done on the user side.

According to Microsoft's blog post, section "Phase I: User Opt-in", people will have to generate new private/public key pairs and update both, Azure configuration and Argo CD repository config with this new key.

Have you tried generating a new key pair, with a supported algorithm?

I have generated new keypair for argocd:

sergei@Macbook  ~  ssh-keygen -l -f /tmp/1
3072 SHA256:ZjVTR8zK9UEmQaQmc8tuPqjszaee1yHIq8KAzJDQUJ8 sergei@Macbook.local (RSA)

have updated the argocd repo-creds secret, restarted repo-server and argocd-server pods but still getting this error from time to time:

time="2024-04-26T13:09:14Z" level=warning msg="finished unary call with code FailedPrecondition" error="rpc error: code = FailedPrecondition desc = error resolving repo revision: rpc error: code = Unknown desc = unknown error: remote: Command git-upload-pack: You’re using ssh-rsa that is about to be deprecated and your request has been blocked intentionally. Any SSH session using SSH-RSA is subject to brown out (failure during random time periods). Please use rsa-sha2-256 or rsa-sha2-512 instead. For more details see https://aka.ms/ado-ssh-rsa-deprecation." grpc.code=FailedPrecondition grpc.method=Sync grpc.service=application.ApplicationService grpc.start_time="2024-04-26T13:09:14Z" grpc.time_ms=234.662 span.kind=server system=grpc
aurel4oxand commented 3 weeks ago

@rouke-broersma If we wanted to disable TLS verify (for the short term, acceptable for our current usage). Is there a way to-do that declaratively? I only see reference to the argo cli... We create our repo credential using the repo-creds secret type declaratively and not via the cli. I'm not finding anything that suggests I can do it that way?

@danijam It seems you can do this by setting insecure flag to true in a Secret with annotation argocd.argoproj.io/secret-type: repository

I guess the URL defined in this Secret should match the one you defined in your Secret with annotation argocd.argoproj.io/secret-type: repo-creds (which is more a template repository credentials that you can use everywhere)

See https://argo-cd.readthedocs.io/en/stable/operator-manual/argocd-repositories-yaml/#argocd-repositoriesyaml-example

praveenjindal62 commented 3 weeks ago

@danijam You can create a secret, one for every repository as below in addition to repo-creds.

Here is the document for same https://argo-cd.readthedocs.io/en/stable/operator-manual/declarative-setup/#repositories

kubectl create secret generic -n ${ARGOCD_NAMESPACE} ${SECRET_NAME} \
    --from-literal=type=git \
    --from-literal=url=${REPOSITORY_URL} \
    --from-literal=insecure=true

kubectl label secret ${SECRET_NAME} -n ${ARGOCD_NAMESPACE} "argocd.argoproj.io/secret-type=repository"
evanmcneal commented 3 weeks ago

Hey folks, I think this change needs to be done on the user side.

According to Microsoft's blog post, section "Phase I: User Opt-in", people will have to generate new private/public key pairs and update both, Azure configuration and Argo CD repository config with this new key.

Have you tried generating a new key pair, with a supported algorithm?

Here is my best attempt to explain why that wouldn't matter.

ssh-rsa256 or 512 is supported by Microsoft Azure DevOps, none of the other ones like ecdsa. It seems that the above thread points out that that version of rsa is not working or currently supported by Argo or the go library at some level(requiring a change to the underlying go libraries and/or Argo itself). Updating the token won't actually work during brownouts or blackouts as I believe it is not being sent to Azure DevOps with the right signatures, and Azure DevOps seems to only support the exact ones that ArgoCD does not as far as SSH Tokens. We updated our tokens to 512 back on April 8th and due to the sporadic nature of the brownouts we never got an inkling that it wasn't working until the brownouts happened to be during working hours for 1 hour intervals. We found 1 single notification with the brownout error that should have been happening since April 8th. So the new brownout period of 1 hour intervals 12 times a day made us very aware the 512 (even 256) rsa tokens do not actually work.

For others wondering about if you still have Argo pulling repos using this ssh url extension vs-ssh.visualstudio.com:v3 rather than the new ssh.dev.azure.com:v3, don't dive too far into that rabbit hole... We thought maybe that old url was being sunset behind the scenes also, so we actually modified some of our Argo components to use that endpoint as well as k8s service along with a new repo-cred and it worked until the brownout. So, if you stumble across the ssh url it doesn't seem to matter either, both urls work are subject to the same issue with the tokens.

sergeibelov113 commented 3 weeks ago

@danijam You can simple create a secret, one for every repository as below in addition to repo-creds.

Here is the document for same https://argo-cd.readthedocs.io/en/stable/operator-manual/declarative-setup/#repositories

kubectl create secret generic -n ${ARGOCD_NAMESPACE} ${SECRET_NAME} \
    --from-literal=type=git \
    --from-literal=url=${REPOSITORY_URL} \
    --from-literal=insecure=true

kubectl label secret ${SECRET_NAME} -n ${ARGOCD_NAMESPACE} "argocd.argoproj.io/secret-type=repository"

I have "argocd.argoproj.io/secret-type: repo-creds" label in argocd v2.10.6+d504d2b set and it works as well

evanmcneal commented 3 weeks ago

@danijam You can simple create a secret, one for every repository as below in addition to repo-creds.

Here is the document for same https://argo-cd.readthedocs.io/en/stable/operator-manual/declarative-setup/#repositories

kubectl create secret generic -n ${ARGOCD_NAMESPACE} ${SECRET_NAME} \
    --from-literal=type=git \
    --from-literal=url=${REPOSITORY_URL} \
    --from-literal=insecure=true

kubectl label secret ${SECRET_NAME} -n ${ARGOCD_NAMESPACE} "argocd.argoproj.io/secret-type=repository"

Are you doing this for every single individual repo or for each Team Project base in Azure DevOps

Example: dev.azure.com/{azuredevopsinstance}/{teamproject} (highest level) or dev.azure.com/{azuredevopsinstance}/{teamproject}/{repo} (individual repo level for each secret)

sergeibelov113 commented 3 weeks ago

@danijam You can simple create a secret, one for every repository as below in addition to repo-creds. Here is the document for same https://argo-cd.readthedocs.io/en/stable/operator-manual/declarative-setup/#repositories

kubectl create secret generic -n ${ARGOCD_NAMESPACE} ${SECRET_NAME} \
    --from-literal=type=git \
    --from-literal=url=${REPOSITORY_URL} \
    --from-literal=insecure=true

kubectl label secret ${SECRET_NAME} -n ${ARGOCD_NAMESPACE} "argocd.argoproj.io/secret-type=repository"

Are you doing this for every single individual repo or for each Team Project base in Azure DevOps

Example: dev.azure.com/{azuredevopsinstance}/{teamproject} (highest level) or dev.azure.com/{azuredevopsinstance}/{teamproject}/{repo} (individual repo level for each secret)

you can set for both, but when you set url as git@ssh.dev.azure.com:v3/(ORG)/(PROJ)/ then argocd will have access to all repos inside the project

sergeibelov113 commented 3 weeks ago

``> > On the ArgoCD 2.8.4 version rsa-sha2-256 is working, but the important thing to check - is in the k8s secret with sshPrivateKey for Azure DevOps if appropriately put the new lines, because when we put this private key in the Azure Key Vault as a secret then "-----END OPENSSH PRIVATE KEY-----" part was not in the new line but it was as a part of the key. After fixing this, ArgoCD can sync properly with Azure DevOps repos.

Hi, I'm using ArgoCD v2.7.11

Thanks to @bartoszpyrek I managed to get this working by :

  • generating a new RSA key with rsa-sha2-512 algorithm : ssh-keygen -t rsa-sha2-512
  • adding generated public key to my Azure DevOps profile

Public key is of form :

ssh-rsa AAAAB3NzaC1y..... user@host
  • doing a simple test on ArgoCD Ui

image

Note the new line in yellow after -----END OPENSSH PRIVATE KEY-----

This worked fine 👌 image

Also, it seems there's no need to update know-hosts configuration, nor skip server verification

Finally, I updated my secret which contains my SSH private key with the new one, also with the newline at the end, and restarted argocd-repo-server pods

I manually deleted repositories which uses the old SSH key and let ArgoCD re-create them (ArgoCD manages itself its configuration, using App of App pattern)

Now everything is fine, let's see if this works during next days...

Good luck 🤞

I made the same but still getting the same error. The only difference - I use git repo binding via secret and template:

configs:
  credentialTemplates:
    azure-devops-creds:
      url: git@ssh.dev.azure.com:v3/XXXXXXXXX/XXXXXXXXXXX/
      sshPrivateKey: |
       ${privateSSHKey}%

What you mean by that: I manually deleted repositories which uses the old SSH key and let ArgoCD re-create them (ArgoCD manages itself its configuration, using App of App pattern)

Have you changed anything on azure devops side except the new ssh key upload?

praveenjindal62 commented 3 weeks ago

@danijam You can simple create a secret, one for every repository as below in addition to repo-creds. Here is the document for same https://argo-cd.readthedocs.io/en/stable/operator-manual/declarative-setup/#repositories

kubectl create secret generic -n ${ARGOCD_NAMESPACE} ${SECRET_NAME} \
    --from-literal=type=git \
    --from-literal=url=${REPOSITORY_URL} \
    --from-literal=insecure=true

kubectl label secret ${SECRET_NAME} -n ${ARGOCD_NAMESPACE} "argocd.argoproj.io/secret-type=repository"

Are you doing this for every single individual repo or for each Team Project base in Azure DevOps Example: dev.azure.com/{azuredevopsinstance}/{teamproject} (highest level) or dev.azure.com/{azuredevopsinstance}/{teamproject}/{repo} (individual repo level for each secret)

you can set for both, but when you set url as git@ssh.dev.azure.com:v3/(ORG)/(PROJ)/ then argocd will have access to all repos inside the project

I tried, but this is not working for me. ArgoCD show "Connection Status" for the repository added with git@ssh.dev.azure.com:v3/(ORG)/(PROJ)/ format, as failed and I keep receiving errors in repo-server. If I add individual repository, it is working.

sergeibelov113 commented 3 weeks ago

Works fine by me. Are you sure that ssh key that you have added has access to all repos in your org/project? Check security permissions in project properties for all repos which are accessed by that user whose ssh key you've added.

2
aurel4oxand commented 3 weeks ago

ou mean by that: I manually deleted repositories which uses the old SSH key and let ArgoCD re-create them (ArgoCD manages itself its configuration, using App of Ap

@sergeibelov113 I deleted repositories from the UI, in Settings -> Repositories list

My ArgoCD instance synchronizes itself, which means it applies its own configuration using declarative way (argocd-cm ConfigMap)

But at this time, I was trying to solve the problem out of the brownout period... none of my actions were relevant !

Anyway, I chose to enable insecure mode for each repositories I own (this is acceptable for me, I'll revert this when a proper fix will be released) and it works now

kind: ConfigMap
metadata:
  name: argocd-cm
apiVersion: v1
data:
  repositories: |
...
    - type: git
      name: xxxxxxxxxxx
      url: git@ssh.dev.azure.com:v3/<org>/<project>/<repo1>
      insecure: true
      sshPrivateKeySecret:
        name: argocd-ssh-keys
        key: ssh-privatekey
    - type: git
      name: xxxxxxxxxxx
      url: git@ssh.dev.azure.com:v3/<org>/<project>/<repo2>
      insecure: true
      sshPrivateKeySecret:
        name: argocd-ssh-keys
        key: ssh-privatekey
...

BTW don't use the syntax above, it's deprecated since v2.1 - I have to refactor this and use repo-creds + repositories secret, as documented

karol-pawlowski commented 3 weeks ago

To me it seems that Microsoft should have disabled the unsupported rsa algorithm from the handshake also in brownout sessions. What I see, sha-rsa tends take priority than sha2 "ssh -Q sign" and the top priority algorithms are Ed/ecdsa" that are unsupported at Azure DevOps. It's Microsoft that should take an action I think.

rouke-broersma commented 3 weeks ago

@danijam You can simple create a secret, one for every repository as below in addition to repo-creds. Here is the document for same https://argo-cd.readthedocs.io/en/stable/operator-manual/declarative-setup/#repositories

kubectl create secret generic -n ${ARGOCD_NAMESPACE} ${SECRET_NAME} \
    --from-literal=type=git \
    --from-literal=url=${REPOSITORY_URL} \
    --from-literal=insecure=true

kubectl label secret ${SECRET_NAME} -n ${ARGOCD_NAMESPACE} "argocd.argoproj.io/secret-type=repository"

Are you doing this for every single individual repo or for each Team Project base in Azure DevOps Example: dev.azure.com/{azuredevopsinstance}/{teamproject} (highest level) or dev.azure.com/{azuredevopsinstance}/{teamproject}/{repo} (individual repo level for each secret)

you can set for both, but when you set url as git@ssh.dev.azure.com:v3/(ORG)/(PROJ)/ then argocd will have access to all repos inside the project

I tried, but this is not working for me. ArgoCD show "Connection Status" for the repository added with git@ssh.dev.azure.com:v3/(ORG)/(PROJ)/ format, as failed and I keep receiving errors in repo-server. If I add individual repository, it is working.

Insecure is not applicable to repository credential template, only for repository credentials:

image

You will have to create a repository with the exact url to be able to set the Insecure flag for that repo.

danijam commented 3 weeks ago

Yeah that's a pain for us as we don't create repository argo resources we just have a big repo cred template to cover the auth for all our repos. I've opened a support ticket with Microsoft to see if there is anything they can do...

praveenjindal62 commented 3 weeks ago

@danijam, we searched for existing ArgoCD applications in the clusters to find potential Azure DevOps SSH URLs, and inserted a secret for all corresponding repo URLs. This process can be automated with a basic bash script. However, I acknowledge that if the target repositories are subject to frequent changes, this solution may not be feasible.

Furthermore, we remain optimistic for a more robust resolution from ArgoCD regarding this issue.

pimjansen commented 3 weeks ago

To me it seems that Microsoft should have disabled the unsupported rsa algorithm from the handshake also in brownout sessions. What I see, sha-rsa tends take priority than sha2 "ssh -Q sign" and the top priority algorithms are Ed/ecdsa" that are unsupported at Azure DevOps. It's Microsoft that should take an action I think.

Can we get this confirmed? I see a lot of noise in this thread so.

If this is the case there is just no issue (except the brownout period)

rouke-broersma commented 3 weeks ago

To me it seems that Microsoft should have disabled the unsupported rsa algorithm from the handshake also in brownout sessions. What I see, sha-rsa tends take priority than sha2 "ssh -Q sign" and the top priority algorithms are Ed/ecdsa" that are unsupported at Azure DevOps. It's Microsoft that should take an action I think.

Can we get this confirmed? I see a lot of noise in this thread so.

If this is the case there is just no issue (except the brownout period)

The brownouts are going to start lasting 8 hours, it's an issue regardless. And there's no confirmation whatsoever that Microsoft won't just keep replying on rsa-ssh with an error after the change.

pimjansen commented 3 weeks ago

@rouke-broersma agree but it would be strange if they would since they don't support it anymore.

Gottox commented 3 weeks ago

I really would like to have some fix for this in ArgoCD sooner than later. It breaks deployments worldwide without a feasible workaround in place.

zamedic commented 3 weeks ago

During a brownout, Here is what server offered: [diffie-hellman-group1-sha1 diffie-hellman-group14-sha1 diffie-hellman-group-exchange-sha256]

so, yeah - the server is offering the sha1 algorithm

zamedic commented 3 weeks ago

I built a local argo cd removing this 1 line https://github.com/argoproj/argo-cd/blob/575575a78a87c7fc97ce540124509f90c5733e05/util/git/ssh.go#L21 containing the algorithm "diffie-hellman-group14-sha1" - my repo server now appears happy with Azure Devops again.

pimjansen commented 3 weeks ago

I built a local argo cd removing this 1 line

https://github.com/argoproj/argo-cd/blob/575575a78a87c7fc97ce540124509f90c5733e05/util/git/ssh.go#L21

containing the algorithm "diffie-hellman-group14-sha1" - my repo server now appears happy with Azure Devops again.

Instead of building, raise a PR if this is a solid solution? The reason why MS pulls out this algorithm is that it is deprecated (no longer safe).

So if this is the case it seems legit to just remove support completely or am i missing something?

zamedic commented 3 weeks ago

I built a local argo cd removing this 1 line https://github.com/argoproj/argo-cd/blob/575575a78a87c7fc97ce540124509f90c5733e05/util/git/ssh.go#L21

containing the algorithm "diffie-hellman-group14-sha1" - my repo server now appears happy with Azure Devops again.

Instead of building, raise a PR if this is a solid solution? The reason why MS pulls out this algorithm is that it is deprecated (no longer safe).

So if this is the case it seems legit to just remove support completely or am i missing something?

By removing it, it would mean that Argo CD no longer supports the sha1 alogrithm. This might be a hard breaking change for some.

The main problem is, during the handshake with Microsoft Azure Devops, they send back a list of accepted algorithms, in that list, it says SHA1 is accepted, so argo sends a SHA1 formatted message. Azure devops then turns around and says "I dont accept sha1"...

The proper fix is for Microsoft to send through the correct set of accepted algorithms. In the interim, I can forsee 2 fixes

  1. have a switch to disable ssh1 for git
  2. investigate how the git client chooses which algorithm to use when it is given multiple and should it prefer some algorithms over others?
karol-pawlowski commented 3 weeks ago

I built a local argo cd removing this 1 line

https://github.com/argoproj/argo-cd/blob/575575a78a87c7fc97ce540124509f90c5733e05/util/git/ssh.go#L21

containing the algorithm "diffie-hellman-group14-sha1" - my repo server now appears happy with Azure Devops again.

Did you manage to test it during the brownout ? From what I see, the default list is being overriden when parameter with algorithms is being passed, and that's what I see at Procfile: "sshd: mkdir -p /var/run/sshd && mkdir -p ~/.ssh && cat ./test/fixture/testrepos/id_rsa.pub > ~/.ssh/authorized_keys && /usr/sbin/sshd -p 2222 -D -e -o KexAlgorithms=diffie-hellman-group-exchange-sha256"

Gottox commented 3 weeks ago

I hope the universe is happy that I wasted my saturday! :laughing: :laughing: :laughing: :sob:

rouke-broersma commented 3 weeks ago

The issue is ssh-rsa host key algoritm, not (or maybe also?) dfh-sha1 key exchange. This is already proven by Flux where the fix is disabling the ssh-rsa host key algorithm. Microsoft specifically mentions ssh-rsa so I don't know why people are suddenly focusing on dfh key exchange.

Alexey-Goru1ev commented 3 weeks ago

Just fould workaround that works for me https://developercommunity.visualstudio.com/t/Support-non-RSA-keys-for-SSH-authenticat/365980#T-N10445094

Here is the steps:

  1. gen new key: ssh-keygen -t ecdsa -C "argocd.example.com"
  2. replace "ecdsa-sha2-nistp256" with "ssh-rsa" in .pub file before putting it in azure
  3. add key to azure
  4. added to knownHosts the same lines as before but for each algorytm (honestly not sure if any except ecdsa-sha2-nistp256 required): ssh-rsa, rsa-sha2-256, rsa-sha2-512, ecdsa-sha2-nistp256
  5. updated key for repos in argocd

UPD: seems doesn't work anymore or it was passed partially before.

zamedic commented 2 weeks ago

I do fear I may have been too hasty with my solution. While, for some reason it removed the error previously, it may have been coincidental. I am doing further research into the issue.

aurel4oxand commented 2 weeks ago

id you manage to test it during the brownout

I know that feel bro 🤣

rouke-broersma commented 2 weeks ago

@crenshaw-dev

My understanding is that this isn't an issue with golang's crypto library. Rather it has to do with the hash algorithms ssh accepts in Ubuntu 22.04 (the version Argo CD images are built on).

It looks like Ubuntu 22.04 does support the listed algorithms (based on this guide), but maybe neither algorithm is enabled by default in the handshake. You might have to explicitly enable that host key algorithm.

As far as we can tell the issue is not with the native client but rather with the go-git bits and specifically in the plumbing that argocd does. Can we get any guidance on implementing a fix? We have some proposals here:

https://github.com/argoproj/argo-cd/issues/17634#issuecomment-2077647842

Gottox commented 2 weeks ago

@rouke-broersma I wrote a PR that implements the workaround: https://github.com/argoproj/argo-cd/pull/18007

rouke-broersma commented 2 weeks ago

@rouke-broersma I wrote a PR that implements the workaround: #18007

You wrote a PR implementing key exchange options based on comments by @zamedic who is now saying that this may actually not be the fix because they still have issues. They were most likely testing their workaround after a brownout had already ended. If you follow the linked comment you can see that we believe the issue is with the host key algorithms not key exchange algorithms. They are two separate configurations. See: https://github.com/argoproj/argo-cd/issues/17634#issuecomment-2077647842

Your change is still a needed change in the grand scheme of things, but unless you can confirm that your fix actually works during a brownout, I don't believe this helps in this case.

zamedic commented 2 weeks ago

In out stack, we use Argo CD to manage Argo CD. When I implemented the fix, Argo CD was failing with the main error, so I could not bump the version. I had to log onto k8s and manually bump the repo server version, after which, stuff appeared to work. I do however see today, that this appears to have been purely coincidental. I am continuing to try and resolve this, however today the brownouts appeared to be even more inconsistent with some fetches working and others not.

Also, I am a complete noob when it comes to how GIT/SSH works when it comes to the transport, so learning alot.

cveld commented 2 weeks ago

How do I reproduce the command that ArgoCD fires at Azure DevOps? Is git-upload-pack easily executed from the commandline?

I tried to reproduce a git operation that uses the ssh-rsa signature but I failed. It keeps defaulting to rsa-sha2-512.

$env:GIT_SSH_COMMAND="ssh -vvv -i keyfile -o IdentitiesOnly=yes"
git pull

results in a verbose ssh log:

...
debug3: order_hostkeyalgs: prefer hostkeyalgs: rsa-sha2-512-cert-v01@openssh.com,rsa-sha2-256-cert-v01@openssh.com,rsa-sha2-512,rsa-sha2-256,ssh-rsa
...
debug2: host key algorithms: ssh-rsa,rsa-sha2-256,rsa-sha2-512
...
debug1: kex: algorithm: diffie-hellman-group-exchange-sha256
debug1: kex: host key algorithm: rsa-sha2-512
debug1: kex: server->client cipher: aes128-ctr MAC: hmac-sha2-256 compression: none
debug1: kex: client->server cipher: aes128-ctr MAC: hmac-sha2-256 compression: none
...

Additionally I can confirm that any RSA key works irrespective of the -t parameter. The key generated with ssh-keygen -t ssh-rsa -f ssh-rsa-keyfile also just works. At least assuming the brown out is active 😅

rouke-broersma commented 2 weeks ago

@cveld enabling ssh-rsa in the ssh user config file should be more or less similar to what argocd does. Ie the inverse of this doc: https://learn.microsoft.com/en-us/azure/devops/repos/git/use-ssh-keys-to-authenticate?view=azure-devops#q-ssh-cannot-establish-a-connection-what-should-i-do

You can't reproduce it by default with an ssh or git client because they disabled ssh-rsa in openssh 1-2 years go.

It's not an exact reproduction because argocd does this in go code instead of using the native client but the approximation is close enough I think.

cveld commented 2 weeks ago

@rouke-broersma thanks! I was able to reproduce the brown out with the following in ~/.ssh/config:

Host ssh.dev.azure.com vs-ssh.visualstudio.com
  HostKeyAlgorithms ssh-rsa
zamedic commented 2 weeks ago

I have multiple Argo CD clusters, on 1 cluster I am running some small changes, that 1 appears to not be experiencing brownouts while the other clusters are experiencing brownouts.

The difference is the go-get version. I ran go get -u github.com/go-git/go-git/v5 which bumped the module up to v5.12.0

Will need to see whats in that change and deploy my fix to all environments to confirm...

blakepettersson commented 2 weeks ago

So this is a no-go. So the only fix, I think, would be to modify

https://github.com/argoproj/argo-cd/blob/0f11dfb5961361807962aafc68b11426b8a47490/util/git/ssh.go#L54 to either have a hardcoded list of supported algorithms or to read the supported algorithms from ~/.ssh/config.

@rouke-broersma IMO I think initializing SupportedSSHKeyExchangeAlgorithms from an environment variable would be the way to go. The default would be the same as today. Perhaps it could also be worthwhile having a new configmap to set the environment variable from.

zamedic commented 2 weeks ago

Ok, I believe I have traced down the root cause... By bumping the go-git, it also bumped https://github.com/skeema/knownhosts to version 1.2.2 this version contains the changes for the new algorithms: https://github.com/skeema/knownhosts/commit/bd8e67ecaa664984a8af209daa256b8aab3454a5

pimjansen commented 2 weeks ago

Ok, I believe I have traced down the root cause...

By bumping the go-git, it also bumped https://github.com/skeema/knownhosts to version 1.2.2

this version contains the changes for the new algorithms: https://github.com/skeema/knownhosts/commit/bd8e67ecaa664984a8af209daa256b8aab3454a5

Seems solid. It would break old implementations though but i assume that is ok since ssh-rsa is deprecated and dropped to start with.

But only the ArgoCD maintainers can decide this if it not hits a BC or if they allow it and so on

rouke-broersma commented 2 weeks ago

So this is a no-go. So the only fix, I think, would be to modify https://github.com/argoproj/argo-cd/blob/0f11dfb5961361807962aafc68b11426b8a47490/util/git/ssh.go#L54

to either have a hardcoded list of supported algorithms or to read the supported algorithms from ~/.ssh/config.

@rouke-broersma IMO I think initializing SupportedSSHKeyExchangeAlgorithms from an environment variable would be the way to go. The default would be the same as today. Perhaps it could also be worthwhile having a new configmap to set the environment variable from.

The issue is with HostKeyAlgorithms not KeyExchangeAlgoritms.

The issue with an environment variable is that the nativeclient which is used for 90% of argocd git interaction would read from ~/.ssh/config so you would need to add the config to both the ssh client config and the environment variable. Ideally both clients would read from the same source, so this difference between the two clients is removed.

See confusion here for example: https://github.com/argoproj/argo-cd/issues/17634#issuecomment-2043498425

I think short term the update of the knownhosts library will fix this issue, but long term the differences between go-git and nativeclient need to be resolved because it seems like argocd maintainers also forget about these differences (understandable, but dangerous since it involves crypto and authentication).

jannfis commented 2 weeks ago

Thanks @zamedic - I've merged your PR.

Before cherry-picking this change into supported release branches and putting out a fix release, can somebody confirm that this actually fixes the issue? For example, by running an image that's been built off the latest code (once the image has been built)? I do not have access to an Azure DevOps Git repository, unfortunately.

jannfis commented 2 weeks ago

Anyone inclined to test a build off the master branch with the fix included, the image is ghcr.io/argoproj/argo-cd/argocd:2.11.0-a63068d0

rouke-broersma commented 2 weeks ago

Anyone inclined to test a build off the master branch with the fix included, the image is ghcr.io/argoproj/argo-cd/argocd:2.11.0-a63068d0

We're preparing our dev environment for testing this change.

Unfortunately there is no brownout at the moment, so we will have to wait until Microsoft randomly turns on the brownout somewhere within the next 20 minutes to 2 hours.

razvangoga commented 2 weeks ago

is this image also available as a helm chart? i could test it but my clusters have argo via helm chart installed

rouke-broersma commented 2 weeks ago

is this image also available as a helm chart? i could test it but my clusters have argo via helm chart installed

You can overwrite the image in the helm chart: https://github.com/argoproj/argo-helm/blob/main/charts/argo-cd/values.yaml#L56

jgwest commented 2 weeks ago

@jannfis I can confirm the fix addresses the issue for my reproduction case.

I built the image from the PR branch, and was able to test it just before the most recent brown out ended. I had 2 separate namespace-scoped Argo CD instances targeting the same private Azure repo...

So presuming the brown out period is shared between them (no reason to think otherwise), I can no longer reproduce.

maartengo commented 2 weeks ago

Anyone inclined to test a build off the master branch with the fix included, the image is ghcr.io/argoproj/argo-cd/argocd:2.11.0-a63068d0

We're preparing our dev environment for testing this change.

Unfortunately there is no brownout at the moment, so we will have to wait until Microsoft randomly turns on the brownout somewhere within the next 20 minutes to 2 hours.

The brownout is active... and we are no longer impacted 🎉

We're looking forward to the next Argo release!

jannfis commented 2 weeks ago

Thanks for testing, folks. I'll cut releases today.

jannfis commented 2 weeks ago

2.10.9 is out, 2.9.14 and 2.8.18 are being built and on their way.

Please let us know if those new releases do not fix the issue for you.