argoproj-labs / argocd-operator

A Kubernetes operator for managing Argo CD clusters.
https://argocd-operator.readthedocs.io
Apache License 2.0
657 stars 762 forks source link

.spec.sso.dex.openShiftOAuth: true no longer seems to work #1012

Open bergner opened 1 year ago

bergner commented 1 year ago

Describe the bug

Setting .spec.sso.dex.openShiftOAuth: true in the ArgoCD CR causes the "Log in via Openshift" button to appear in ArgoCD's UI but clicking that button causes a 400 error on GET https://argocd-server-argocd.apps-crc.testing/auth/login?return_url=https://argocd-server-argocd.apps-crc.testing/applications with the message:

Invalid redirect URL: the protocol and host (including port) must match and the path must be within allowed URLs if provided

To Reproduce

I have been running ArgoCD on Openshift CRC successfully for a couple of months now but I recently redid my Openshift CRC setup from scratch to increase its dedicated disk size. After installing ArgoCD operator through OperatorHub interface in Openshift I created an "argocd" namespace with oc create ns argocd and then provisioned an ArgoCD CR with oc apply -f on the yaml below.

I have v0.7.0 of the operator installed (image: quay.io/argoprojlabs/argocd-operator@sha256:5541a1c2323016b767f53bcadf696ebd199903d2048f7bd2aee377a332ea5f2c)

---
apiVersion: user.openshift.io/v1
kind: Group
metadata:
  name: cluster-admins
users:
- kubeadmin
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: cluster-admin-argocd
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: cluster-admin
subjects:
- apiGroup: rbac.authorization.k8s.io
  kind: Group
  name: cluster-admins
---
apiVersion: argoproj.io/v1alpha1
kind: ArgoCD
metadata:
  name: argocd
  namespace: argocd
spec:
  server:
    autoscale:
      enabled: false
    grpc:
      ingress:
        enabled: false
    ingress:
      enabled: false
    resources:
      limits:
        cpu: 50m
        memory: 256Mi
      requests:
        cpu: 20m
        memory: 128Mi
    route:
      enabled: true
      tls:
        insecureEdgeTerminationPolicy: Redirect
        termination: reencrypt
    service:
      type: ''
  grafana:
    enabled: false
    ingress:
      enabled: false
    route:
      enabled: false
  resourceTrackingMethod: annotation
  notifications:
    enabled: false
  prometheus:
    enabled: false
    ingress:
      enabled: false
    route:
      enabled: false
  initialSSHKnownHosts: {}
  sso:
    dex:
      openShiftOAuth: true
      resources:
        limits:
          cpu: 25m
          memory: 256Mi
        requests:
          cpu: 10m
          memory: 128Mi
    provider: dex
  rbac:
    defaultPolicy: 'role:readonly'
    policy: |
      g, cluster-admins, role:admin
    scopes: '[groups]'
  repo:
    resources:
      limits:
        cpu: 25m
        memory: 512Mi
      requests:
        cpu: 10m
        memory: 256Mi
  ha:
    enabled: false
    resources:
      limits:
        cpu: 25m
        memory: 256Mi
      requests:
        cpu: 10m
        memory: 128Mi
  tls:
    ca: {}
  redis:
    resources:
      limits:
        cpu: 25m
        memory: 256Mi
      requests:
        cpu: 10m
        memory: 128Mi
  controller:
    processors: {}
    resources:
      limits:
        cpu: 50m
        memory: 2Gi
      requests:
        cpu: 20m
        memory: 1Gi
    sharding: {}

I also tried adding .spec.server.url: https://argocd-server-argocd.apps-crc.testing in the ArgoCD CR but that did not help.

Additional information

The argocd-cm ConfigMap contains this (plus some other properties that I don't think are related to this):

data:
  dex.config: |
    connectors:
    - config:
        clientID: system:serviceaccount:argocd:argocd-argocd-dex-server
        clientSecret: $oidc.dex.clientSecret
        groups: []
        insecureCA: true
        issuer: https://kubernetes.default.svc
        redirectURI: https://argocd-server/api/dex/callback
      id: openshift
      name: OpenShift
      type: openshift
  url: https://argocd-server

The argocd-argocd-dex-server ServiceAccount has the following annotation:

  annotations:
    serviceaccounts.openshift.io/oauth-redirecturi.argocd: https://argocd-server/api/dex/callback

Unfortunately I don't have the exact contents of argocd-cm and argocd-argocd-dex-server from before the reinstall of Openshift CRC but at least one theory is that it used to use the Route FQDN and not use "https://argocd-server".

I tried setting --loglevel debug on both the argocd-server and argocd-dex-server Deployments. I can't see what is happening when I click "Log in via Openshift" but on startup of argocd-dex-server this entry in the log file is interesting (I replaced "\n" with line breaks here for readability). Why is config.redirectURI and staticClients[0].redirectURIs completely different for example?

time="2023-10-01T13:05:13Z" level=debug msg="
connectors:
- config:
    clientID: system:serviceaccount:argocd:argocd-argocd-dex-server
    clientSecret: '********'
    groups: []
    insecureCA: true
    issuer: https://kubernetes.default.svc
    redirectURI: https://argocd-server/api/dex/callback
  id: openshift
  name: OpenShift
  type: openshift
grpc:
  addr: 0.0.0.0:5557
issuer: https://argocd-server/api/dex
oauth2:
  skipApprovalScreen: true
staticClients:
- id: argo-cd
  name: Argo CD
  redirectURIs:
  - https://argocd-server/auth/callback
  secret: '********'
- id: argo-cd-cli
  name: Argo CD CLI
  public: true
  redirectURIs:
  - http://localhost
  - http://localhost:8085/auth/callback
storage:
  type: memory
telemetry:
  http: 0.0.0.0:5558
web:
  https: 0.0.0.0:5556
  tlsCert: /tmp/tls.crt
  tlsKey: /tmp/tls.key

Update 1: Setting .spec.server.host

Setting .spec.server.host: argocd-server-argocd.apps-crc.testing explicitly gets me further. I do get the Openshift login screen now but after entering credentials there I end up back on the ArgoCD login screen and it says "Login failed". This is the HTTP communication that takes place during the login process:

>>> GET https://argocd-server-argocd.apps-crc.testing/auth/login?return_url=https%3A%2F%2Fargocd-server-argocd.apps-crc.testing%2Fapplications
<<< 303 location: https://argocd-server-argocd.apps-crc.testing/api/dex/auth?client_id=argo-cd&redirect_uri=https%3A%2F%2Fargocd-server-argocd.apps-crc.testing%2Fauth%2Fcallback&response_type=code&scope=openid+profile+email+groups&state=iXOenGNGOIVlqEUhKMECLJhR

>>> GET https://argocd-server-argocd.apps-crc.testing/api/dex/auth?client_id=argo-cd&redirect_uri=https%3A%2F%2Fargocd-server-argocd.apps-crc.testing%2Fauth%2Fcallback&response_type=code&scope=openid+profile+email+groups&state=iXOenGNGOIVlqEUhKMECLJhR
<<< 302 location: /api/dex/auth/openshift?client_id=argo-cd&redirect_uri=https%3A%2F%2Fargocd-server-argocd.apps-crc.testing%2Fauth%2Fcallback&response_type=code&scope=openid+profile+email+groups&state=iXOenGNGOIVlqEUhKMECLJhR

>>> GET https://argocd-server-argocd.apps-crc.testing/api/dex/auth/openshift?client_id=argo-cd&redirect_uri=https%3A%2F%2Fargocd-server-argocd.apps-crc.testing%2Fauth%2Fcallback&response_type=code&scope=openid+profile+email+groups&state=iXOenGNGOIVlqEUhKMECLJhR
<<< 302 location: https://oauth-openshift.apps-crc.testing/oauth/authorize?client_id=system%3Aserviceaccount%3Aargocd%3Aargocd-argocd-dex-server&redirect_uri=https%3A%2F%2Fargocd-server-argocd.apps-crc.testing%2Fapi%2Fdex%2Fcallback&response_type=code&scope=user%3Ainfo&state=y6svfjhybow3jmivvdillts72

>>> GET https://oauth-openshift.apps-crc.testing/oauth/authorize?client_id=system%3Aserviceaccount%3Aargocd%3Aargocd-argocd-dex-server&redirect_uri=https%3A%2F%2Fargocd-server-argocd.apps-crc.testing%2Fapi%2Fdex%2Fcallback&response_type=code&scope=user%3Ainfo&state=y6svfjhybow3jmivvdillts72
<<< 302 location: /login?then=%2Foauth%2Fauthorize%3Fclient_id%3Dsystem%253Aserviceaccount%253Aargocd%253Aargocd-argocd-dex-server%26redirect_uri%3Dhttps%253A%252F%252Fargocd-server-argocd.apps-crc.testing%252Fapi%252Fdex%252Fcallback%26response_type%3Dcode%26scope%3Duser%253Ainfo%26state%3Dy6svfjhybow3jmivvdillts72

>>> GET https://oauth-openshift.apps-crc.testing/login?then=%2Foauth%2Fauthorize%3Fclient_id%3Dsystem%253Aserviceaccount%253Aargocd%253Aargocd-argocd-dex-server%26redirect_uri%3Dhttps%253A%252F%252Fargocd-server-argocd.apps-crc.testing%252Fapi%252Fdex%252Fcallback%26response_type%3Dcode%26scope%3Duser%253Ainfo%26state%3Dy6svfjhybow3jmivvdillts72
<<< 200 OK (here we get the actual Openshift login UI, enter credentials and submit)

>>> POST https://oauth-openshift.apps-crc.testing/login with body (password replaced by stars here):
then=%2Foauth%2Fauthorize%3Fclient_id%3Dsystem%253Aserviceaccount%253Aargocd%253Aargocd-argocd-dex-server%26redirect_uri%3Dhttps%253A%252F%252Fargocd-server-argocd.apps-crc.testing%252Fapi%252Fdex%252Fcallback%26response_type%3Dcode%26scope%3Duser%253Ainfo%26state%3Dy6svfjhybow3jmivvdillts72&csrf=ChQ3VPupdznVCVwu9Nlr4gyXBtGjq7lCpAh_SF8rqtE&username=kubeadmin&password=**********
<<< 302 location: /oauth/authorize?client_id=system%3Aserviceaccount%3Aargocd%3Aargocd-argocd-dex-server&redirect_uri=https%3A%2F%2Fargocd-server-argocd.apps-crc.testing%2Fapi%2Fdex%2Fcallback&response_type=code&scope=user%3Ainfo&state=y6svfjhybow3jmivvdillts72

>>> GET https://oauth-openshift.apps-crc.testing/oauth/authorize?client_id=system%3Aserviceaccount%3Aargocd%3Aargocd-argocd-dex-server&redirect_uri=https%3A%2F%2Fargocd-server-argocd.apps-crc.testing%2Fapi%2Fdex%2Fcallback&response_type=code&scope=user%3Ainfo&state=y6svfjhybow3jmivvdillts72
<<< 302 location: https://argocd-server-argocd.apps-crc.testing/api/dex/callback?code=sha256~Y2OB_pukTXVV9qlgFW3flNCmmnedlOMARE8hM_AiLdI&state=y6svfjhybow3jmivvdillts72

>>> GET https://argocd-server-argocd.apps-crc.testing/api/dex/callback?code=sha256~Y2OB_pukTXVV9qlgFW3flNCmmnedlOMARE8hM_AiLdI&state=y6svfjhybow3jmivvdillts72
<<< 303 location: /login?has_sso_error=true

>>> GET https://argocd-server-argocd.apps-crc.testing/login?has_sso_error=true
<<< 200 OK (we are back on the ArgoCD login screen with "Login failed" message)

Looking at the argocd-dex-server log file I have this:

time="2023-10-01T19:09:50Z" level=error msg="Failed to authenticate: oidc: failed to get token: oauth2: cannot fetch token: 400 Bad Request\nResponse: {\"error\":\"invalid_request\",\"error_description\":\"The request is missing a required parameter, includes an invalid parameter value, includes a parameter more than once, or is otherwise malformed.\"}\n"

In the logs for the oauth-openshift pod in the openshift-authentication namespace I see the following:

E1001 19:09:50.762229       1 access.go:177] osin: error=server_error, internal_error=&errors.errorString{s:"invalid resource name \"system%3Aserviceaccount%3Aargocd%3Aargocd-argocd-dex-server\": [may not contain '%']"} get_client=error finding client
E1001 19:09:50.762244       1 osinserver.go:111] internal error: invalid resource name "system%3Aserviceaccount%3Aargocd%3Aargocd-argocd-dex-server": [may not contain '%']
E1001 19:09:50.762738       1 access.go:154] osin: error=invalid_request, internal_error=&errors.errorString{s:"Client authentication not sent"} get_client_auth=client authentication not sent
E1001 19:09:50.762779       1 osinserver.go:111] internal error: Client authentication not sent

Could there be some double encoding going on perhaps where ":" gets encoded as %3A and then %253A, which after decoding once becomes %3A rather than ":"?

Expected behavior Log in via Openshift works.

bergner commented 1 year ago

995 seems like mostly a duplicate of the bug I reported here but I think this bug can still have some merit in that it seems behavior of .spec.sso.dex.openShiftOAuth: truechanged w.r.t that the redirect URL was previously by default set based on the Route if the Openshift Route was enabled but in v0.7.0 that no longer seems to be the case.

Since that can be worked around by explicitly setting .spec.server.url the more serious problem is the seemingly invalid request sent from argocd-dex-server to openshift-oauth (which is what #995 also is about).

I looked a bit at the code for openshift-oauth and it is using the basic Golang request.FormValue("client_id") (and previously used request.Form.Get("client_id")) both of which would do one level URI unescaping. So that suggests that what gets sent over the wire is double encoded.

bergner commented 1 year ago

I figured out how to increase the log level on oauth-openshift so I can see the HTTP requests in its log file and there it looks like the ArgoCD Dex Server request is correct with one level of escaping. I created an issue in the openshift/oauth-server project to maybe get some feedback from there.

This was the interesting log snippet from the oauth-openshift pod, as you can see the client_id has the same level of escaping as the redirect_uri.

I1013 10:29:35.684794       1 httplog.go:131] "HTTP" verb="GET" URI="/oauth/authorize?client_id=system%3Aserviceaccount%3Aargocd%3Aargocd-argocd-dex-server&redirect_uri=https%3A%2F%2Fargocd-server-argocd.apps-crc.testing%2Fapi%2Fdex%2Fcallback&response_type=code&scope=user%3Ainfo&state=qtdg2lscsyrq7qlkgutypygxi" latency="90.415571ms" userAgent="Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/116.0.5845.103 Safari/537.36" audit-ID="cac191fd-8164-452d-b1ca-0040fc0a621b" srcIP="10.217.0.1:54130" resp=302
I1013 10:29:35.701586       1 request.go:833] Error in request: invalid resource name "system%3Aserviceaccount%3Aargocd%3Aargocd-argocd-dex-server": [may not contain '%']
E1013 10:29:35.701646       1 access.go:177] osin: error=server_error, internal_error=&errors.errorString{s:"invalid resource name \"system%3Aserviceaccount%3Aargocd%3Aargocd-argocd-dex-server\": [may not contain '%']"} get_client=error finding client
E1013 10:29:35.701664       1 osinserver.go:111] internal error: invalid resource name "system%3Aserviceaccount%3Aargocd%3Aargocd-argocd-dex-server": [may not contain '%']
bergner commented 1 year ago

oauth-openshift or the https://github.com/openshift/osin library looks like the main culprit here. I was able to verify that I got login to work in https://github.com/openshift/oauth-server/issues/136 by reverting one of the commits done there which made some fixes and updated the openshift/osin library to a newer version.

bergner commented 1 year ago

I'm not sure why but this seems to have magically solved itself last week, presumably due to some auto-update of the Openshift authentication operator or oauth-openshift. Since then I have not been able to reproduce this problem with the code exchange failing.

The .spec.server.url still needs to be set, i.e. it is not implicitly based on the exposed Route like it was in an earlier version.

iam-veeramalla commented 1 year ago

Thanks for reporting @bergner , sure we will take a look at it.