argoproj / argo-cd

Declarative Continuous Deployment for Kubernetes
https://argo-cd.readthedocs.io
Apache License 2.0
16.75k stars 5.08k forks source link

Cli + --grpc-web + large headers issue #9475

Open mcyrrer opened 2 years ago

mcyrrer commented 2 years ago

Discussed in https://github.com/argoproj/argo-cd/discussions/9288

Describe the bug We are using SSO through Azure AD and when we retrieve the full list of groups a users belongs to we get issues with the header size. All works well when we just claim the ApplicationGroups (=just a few groups) but not when we ask for SecurityGroup(=all groups that a user belongs to). After a long debugging session in looks like either the argocd cli or server does not like large header (in this case a large header with all the azure ad group claims).

How to reproduce

This is how to reproduce the issue without the need to use an login claim with many groups.

argocd proj list --header='h:<here be 8200+ chars>'

Expected behavior

Possible to use argocd cli with a user that belongs to a large number of azure ad groups.

Information

For the browser based experience we have managed to solve this by adding the row below to our Ingress

nginx.ingress.kubernetes.io/proxy-buffer-size: "20k"

But for the Argocd cli we get issues after the sso login flow

   argocd login argocd.example.com --sso --grpc-web
   Authentication successful
   'user.name@example.com' logged in successfully
   Context 'argocd.example.com' updated

   argocd proj list 
   FATA[0000] rpc error: code = Unknown desc = POST https://argocd.example.com:443/project.ProjectService/List 
   failed with status code 400

I have done some debugging and it looks to me that if the cli has a header with a size of ~8200 or more chars including the key the cli will fail. If there is fewer chars if works fine with the --grpc-web parameter.

argocd proj list --header='h:<here be 8200 chars>'

All I get in the logs are in the nginx log:

172.40.120.135 - - [03/May/2022:20:05:38 +0000] "POST /project.ProjectService/List HTTP/1.1" 400 226 "-" "argocd-client/v2.3.3+07ac038.dirty grpc-go/1.15.0" 211 0.066 [] [] - - - - ced748b637111d86a691e111b983ebb

I have nothing in the argocd-server log.

I think there is a something in the cli call that that does not manage to interpret the large header in the correct way.

Some information on the setup:

Ingress:

  ---
  apiVersion: networking.k8s.io/v1
  kind: Ingress
  metadata:
    name: argocd-server-ingress
    namespace: argocd
    annotations:
      # If you encounter a redirect loop or are getting a 307 response code
      # then you need to force the nginx ingress to connect to the backend using HTTPS.
      nginx.ingress.kubernetes.io/backend-protocol: "HTTPS"
      nginx.ingress.kubernetes.io/ssl-passthrough: "true"
      # To fix the "upstream sent too big header while reading response header from upstream" issue since
      # AAD returns a very large header with all AAD groups
      # https://stackoverflow.com/questions/58943111/nginx-ingress-returns-502-after-post-with-redirect
      nginx.ingress.kubernetes.io/proxy-buffer-size: "20k"
  spec:
    ingressClassName: nginx
    tls:
    - hosts:
      - argocd.example.com
    rules:
    - host: argocd.example.com
      http:
        paths:
        - path: /
          pathType: Prefix
          backend:
            service:
              name: argocd-server
              port:
                number: 443

Version:

argocd: v2.3.3+07ac038.dirty
  BuildDate: 2022-03-30T05:14:36Z
  GitCommit: 07ac038a8f97a93b401e824550f0505400a8c84e
  GitTreeState: dirty
  GoVersion: go1.18
  Compiler: gc
  Platform: darwin/arm64
argocd-server: v2.3.3+07ac038
  BuildDate: 2022-03-30T00:06:18Z
  GitCommit: 07ac038a8f97a93b401e824550f0505400a8c84e
  GitTreeState: clean
  GoVersion: go1.17.6
  Compiler: gc
  Platform: linux/amd64
  Ksonnet Version: v0.13.1
  Kustomize Version: v4.4.1 2021-11-11T23:36:27Z
  Helm Version: v3.8.0+gd141386
  Kubectl Version: v0.23.1
  Jsonnet Version: v0.18.0
StepanKuksenko commented 1 year ago

Faced the same issue. A user had 30+ groups assigned in Azure AD so the default nginx buffer_size was not enough to handle it.

mkilchhofer commented 10 months ago

I think this could be resolved by using the userinfo endpoint via this proposal from my colleague:

jalavoy commented 8 months ago

Faced the same issue. A user had 30+ groups assigned in Azure AD so the default nginx buffer_size was not enough to handle it.

Which nginx did you bump this on? I've been doing it on our ingress controller and not having any luck.

nhavens commented 3 weeks ago

The following annotation on my Argo CD Ingress resolved the 400 status code responses to the argocd CLI for me:

    nginx.ingress.kubernetes.io/server-snippet: |
      large_client_header_buffers 4 100k;

While another approach would be to increase the default client_header_buffer_size (docs) from 1k, this approach only allocates these larger buffers when necessary.