c0c0n3 / kitt4sme.live

On a mission to bring AI to the shop floor: https://kitt4sme.eu/
MIT License
1 stars 28 forks source link

ArgoCD TLS certificate validation fails when login with Keycloak #210

Closed karikolehmainen closed 1 year ago

karikolehmainen commented 1 year ago

Apparently Keycloak cannot establish a valid authority chain with server certificates and fails with error: Failed to query provider "https://kitt4sme.collab-cloud.eu/auth/realms/master": Get "https://kitt4sme.collab-cloud.eu/auth/realms/master/.well-known/openid-configuration": x509: certificate signed by unknown authority

We need to figure out which Root CA Keycloak should use and provide valid certificates that use that specific root CA

c0c0n3 commented 1 year ago

So it looks like ArgoCD doesn't trust the authority that signed the certificate #207 introduced. If you try logging in through Keycloak, ArgoCD initialises its OIDC provider with the URL you specify in the argocd-cm config map. If you set that URL to

ArgoCD will try retrieving the well-known OIDC config from

but the HTTP GET will fail with this error message

Failed to query provider "https://kitt4sme.collab-cloud.eu/auth/realms/master": Get "https://kitt4sme.collab-cloud.eu/auth/realms/master/.well-known/openid-configuration": x509: certificate signed by unknown authority

The most likely reason for that is the root authority in our cert is not listed among those of the OIDC lib ArgoCD uses or among those of the Go HTTP client the OIDC lib uses to call Keycloak.

c0c0n3 commented 1 year ago

@karikolehmainen @RyanKelvinFord

I finally got to the bottom of this. The reason for that error is that the ArgoCD version we use, v2.2.5, is compiled with Go 1.16 and that version of Go doesn't have a recent list of signing authorities. Our certificate chain contains an authority that's not in the Go 1.16's list, so HTTPs calls to our server will fail on certificate validation.

For the record, here's how I found out. The ArgoCD error comes from this line of code

I put together a small Go program to import ArgoCD v2.2.5 as a lib and to make a call to hit that line of code. When I compile and run the program with Go 1.16, it bombs out with the exact error mentioned in the comment above. On the other hand, if I compile and run with Go 1.18, there's no error.

To reproduce yourself, make a new directory and put this go.mod file in it

// Module dependencies from
// - https://github.com/argoproj/argo-cd/blob/v2.2.5/go.mod
//
// Downloaded ArgoCD's go.mod, added my own module name and a require
// for `github.com/argoproj/argo-cd/v2 v2.2.5`, then ran `go mod tidy`.
// See also:
// - https://argo-cd.readthedocs.io/en/stable/user-guide/import/
//
module github.com/c0c0n3/kitt4sme.live

go 1.16

require github.com/argoproj/argo-cd/v2 v2.2.5

replace (
    github.com/golang/protobuf => github.com/golang/protobuf v1.4.2
    github.com/gorilla/websocket => github.com/gorilla/websocket v1.4.2
    github.com/grpc-ecosystem/grpc-gateway => github.com/grpc-ecosystem/grpc-gateway v1.16.0
    github.com/improbable-eng/grpc-web => github.com/improbable-eng/grpc-web v0.0.0-20181111100011-16092bd1d58a

    google.golang.org/grpc => google.golang.org/grpc v1.15.0

    k8s.io/api => k8s.io/api v0.22.2
    k8s.io/apiextensions-apiserver => k8s.io/apiextensions-apiserver v0.22.2
    k8s.io/apimachinery => k8s.io/apimachinery v0.22.2
    k8s.io/apiserver => k8s.io/apiserver v0.22.2
    k8s.io/cli-runtime => k8s.io/cli-runtime v0.22.2
    k8s.io/client-go => k8s.io/client-go v0.22.2
    k8s.io/cloud-provider => k8s.io/cloud-provider v0.22.2
    k8s.io/cluster-bootstrap => k8s.io/cluster-bootstrap v0.22.2
    k8s.io/code-generator => k8s.io/code-generator v0.22.2
    k8s.io/component-base => k8s.io/component-base v0.22.2
    k8s.io/component-helpers => k8s.io/component-helpers v0.22.2
    k8s.io/controller-manager => k8s.io/controller-manager v0.22.2
    k8s.io/cri-api => k8s.io/cri-api v0.22.2
    k8s.io/csi-translation-lib => k8s.io/csi-translation-lib v0.22.2
    k8s.io/kube-aggregator => k8s.io/kube-aggregator v0.22.2
    k8s.io/kube-controller-manager => k8s.io/kube-controller-manager v0.22.2
    k8s.io/kube-proxy => k8s.io/kube-proxy v0.22.2
    k8s.io/kube-scheduler => k8s.io/kube-scheduler v0.22.2
    k8s.io/kubectl => k8s.io/kubectl v0.22.2
    k8s.io/kubelet => k8s.io/kubelet v0.22.2
    k8s.io/legacy-cloud-providers => k8s.io/legacy-cloud-providers v0.22.2
    k8s.io/metrics => k8s.io/metrics v0.22.2
    k8s.io/mount-utils => k8s.io/mount-utils v0.22.2
    k8s.io/pod-security-admission => k8s.io/pod-security-admission v0.22.2
    k8s.io/sample-apiserver => k8s.io/sample-apiserver v0.22.2
)

Then add this main.go program

package main

import (
    "crypto/tls"
    "fmt"
    "io"
    "net/http"

    "github.com/argoproj/argo-cd/v2/util/oidc"
)

const (
    issuerURL = "https://kitt4sme.collab-cloud.eu/auth/realms/master"
    wellKnown = "https://kitt4sme.collab-cloud.eu/auth/realms/master/.well-known/openid-configuration"
)

// Go's stock HTTP client.
func client() *http.Client {
    // make sure we check the server cert is valid.
    tr := &http.Transport{
        TLSClientConfig: &tls.Config{InsecureSkipVerify: false},
    }
    return &http.Client{Transport: tr}
}

// Use Go's stock client to GET the OIDC well-known config from
// `kitt4sme.collab-cloud.eu`.
func getWellKnown() ([]byte, error) {
    res, err := client().Get(wellKnown)
    if err != nil {
        return nil, err
    }
    defer res.Body.Close()
    return io.ReadAll(res.Body)
}

// Use ArgoCD's OIDC provider to figure out where the token endpoint
// is on `kitt4sme.collab-cloud.eu`. This is basically the same call
// ArgoCD makes in real life when we log in through SSO.
func argocdGetWellKnown() (string, error) {
    provider := oidc.NewOIDCProvider(issuerURL, client())
    ep, err := provider.Endpoint()
    return fmt.Sprintf("%+v", ep), err
}

func main() {
    fmt.Printf("GET %s\n", wellKnown)
    if wn, err := getWellKnown(); err != nil {
        fmt.Printf("%v\n", err)
    } else {
        fmt.Printf("%s", wn)
    }

    fmt.Printf("\n\nArgoCD endpoint lookup (same as in SSO flow)\n")
    if endpoint, err := argocdGetWellKnown(); err != nil {
        fmt.Printf("%v\n", err)
    } else {
        fmt.Printf("%s\n", endpoint)
    }
}

Finally run the following commands using Go 1.16.

$ go version
go version go1.16.9 darwin/amd64

$ go mod tidy
$ go run main.go

The program should print this

GET https://kitt4sme.collab-cloud.eu/auth/realms/master/.well-known/openid-configuration
Get "https://kitt4sme.collab-cloud.eu/auth/realms/master/.well-known/openid-configuration": x509: certificate signed by unknown authority

ArgoCD endpoint lookup (same as in SSO flow)
INFO[0000] Initializing OIDC provider (issuer: https://kitt4sme.collab-cloud.eu/auth/realms/master)
Failed to query provider "https://kitt4sme.collab-cloud.eu/auth/realms/master": Get "https://kitt4sme.collab-cloud.eu/auth/realms/master/.well-known/openid-configuration": x509: certificate signed by unknown authority

Now if you tell the Go HTTP lib to skip cert validation and run the program again, everything works as expected. In main.go, change InsecureSkipVerify: false to InsecureSkipVerify: true. Then

$ go run main.go

You should see this output instead

GET https://kitt4sme.collab-cloud.eu/auth/realms/master/.well-known/openid-configuration
{"issuer":"https://kitt4sme.collab-cloud.eu/auth/realms/master","authorization_endpoint":"https://kitt4sme.collab-cloud.eu/auth/realms/master/protocol/openid-connect/auth","token_endpoint":"https://kitt4sme.collab-cloud.eu/auth/realms/master/protocol/openid-connect/token",...

ArgoCD endpoint lookup (same as in SSO flow)
INFO[0000] Initializing OIDC provider (issuer: https://kitt4sme.collab-cloud.eu/auth/realms/master)
INFO[0000] OIDC supported scopes: [openid email profile roles web-origins address groups offline_access phone microprofile-jwt]
&{AuthURL:https://kitt4sme.collab-cloud.eu/auth/realms/master/protocol/openid-connect/auth TokenURL:https://kitt4sme.collab-cloud.eu/auth/realms/master/protocol/openid-connect/token AuthStyle:0}

Notice if you compile and run the original main.go with Go 1.18, there's no errors. Certificate validation works flawlessly and you get the exact same output as above.

c0c0n3 commented 1 year ago

@karikolehmainen @RyanKelvinFord so where does that leave us? I'd say the easiest fix at this point is to upgrade ArgoCD.

The least viable version we can upgrade to is v2.4.0 as that's the first one they started compiling with Go 1.18:

Notice we've running ArgoCD v2.2.5 at the moment, so there might be some incompatibility and we need to test thoroughly in a dedicated test bed---e.g. using Multipass on a machine with enough horsepower to host the whole cluster. We can't roll out this upgrade without that kind of testing b/c it has the potential to break alot of things in the live cluster and in those of the open call devs...

If you feel adventurous, we could even try a later ArgoCD version.

c0c0n3 commented 1 year ago

Security note

As soon as we complete the Argo CD upgrade, we should also undo the changes in #212.

karikolehmainen commented 1 year ago

Good job finding that out! @c0c0n3 do you want to do a pull request with 2.4.0 version or should I? I got some weird errors now when trying to login with Keycloak (Invalid redirect URL: the protocol, etc. ) It worked at least once, but seems a bit random.

c0c0n3 commented 1 year ago

hi @karikolehmainen :-)

do a pull request with 2.4.0 version or should I?

yes, please go ahead

c0c0n3 commented 1 year ago

230 implemented a more permanent fix that doesn't depend on the Go version Argo CD got compiled with.