argoproj / argo-workflows

Workflow Engine for Kubernetes
https://argo-workflows.readthedocs.io/
Apache License 2.0
15.12k stars 3.21k forks source link

v3.5.5 `unknown name groups` during SSO #13061

Open Freddybob4244 opened 6 months ago

Freddybob4244 commented 6 months ago

Pre-requisites

What happened/what you expected to happen?

Summary

All non-admin users assume a read-only role. After v3.5.5 is applied, non-admin users can no longer use argo-workflows and are prompted with a generic error in the UI:

image image

Admin users can navigate normally.

Argo-server pod logs provide a bit more useful insight:

time="2024-05-15T17:49:33.746Z" level=error msg="failed to perform RBAC authorization" error="failed to evaluate rule: unknown name groups (1:43)\n | 'ac1ef805-ac6a-4ce9-854a-1cb406aa7121' in groups\n | ..........................................^"

time="2024-05-15T17:49:33.746Z" level=warning msg="finished unary call with code PermissionDenied" error="rpc error: code = PermissionDenied desc = not allowed" grpc.code=PermissionDenied grpc.method=ListWorkflows grpc.service=workflow.WorkflowService grpc.start_

time="2024-05-15T17:49:33Z" grpc.time_ms=1.682 span.kind=server system=grpc`

time="2024-05-15T17:49:33.746Z" level=info duration=2.267779ms method=GET path=/api/v1/workflows/argo size=34 status=403

There aren't any additional logs that are related to this error that I can find.

Testing Performed

To confirm 3.5.5 introduced the issue I tested a few different versions with a read-only test user in a sandbox installation.

Configurations

Argo-Server

```yaml apiVersion: apps/v1 kind: Deployment metadata: name: argo-server namespace: argo spec: selector: matchLabels: app: argo-server template: metadata: labels: app: argo-server spec: containers: - args: - server - "--auth-mode" - sso - "--auth-mode" - client - "--namespaced" - "--secure=false" env: [] image: quay.io/argoproj/argocli:3.5.6 name: argo-server ports: - containerPort: 2746 name: web readinessProbe: httpGet: path: / port: 2746 scheme: HTTP initialDelaySeconds: 10 periodSeconds: 20 securityContext: allowPrivilegeEscalation: false capabilities: drop: - ALL readOnlyRootFilesystem: true runAsNonRoot: true volumeMounts: - mountPath: /tmp name: tmp nodeSelector: kubernetes.io/os: linux securityContext: runAsNonRoot: true serviceAccountName: argo-server volumes: - emptyDir: {} name: tmp ```

workflow-controller-configmap

```yaml apiVersion: v1 kind: ConfigMap metadata: name: workflow-controller-configmap namespace: argo data: artifactRepository: | archiveLogs: true s3: bucket: logs-devops endpoint: minio:9000 insecure: true accessKeySecret: name: my-minio-cred key: accesskey secretKeySecret: name: my-minio-cred key: secretkey links: | - name: Example Workflow Link scope: workflow url: http://logging-facility?namespace=${metadata.namespace}&workflowName=${metadata.name}&startedAt=${status.startedAt}&finishedAt=${status.finishedAt} - name: Example Pod Link scope: pod url: http://logging-facility?namespace=${metadata.namespace}&podName=${metadata.name}&startedAt=${status.startedAt}&finishedAt=${status.finishedAt} metricsConfig: | disableLegacy: true enabled: true path: /metrics port: 9090 persistence: | connectionPool: maxIdleConns: 100 maxOpenConns: 0 connMaxLifetime: 0s nodeStatusOffLoad: true archive: true archiveTTL: 30d postgresql: ssl: true sslmode: require host: ****.postgres.database.azure.com port: 5432 database: argodevops tableName: argo_workflows userNameSecret: name: argo-postgres-creds key: username passwordSecret: name: argo-postgres-creds key: password sso: > issuer: https://login.microsoftonline.com/****/v2.0 clientId: name: client-id-secret key: client-id-key clientSecret: name: client-secret-secret key: client-secret-key redirectUrl: https://argo-devops.dev.int.****.cloud/oauth2/callback scopes: - https://graph.microsoft.com/Group.Read.All rbac: enabled: true workflowDefaults: | spec: activeDeadlineSeconds: 14400 ttlStrategy: secondsAfterCompletion: 604800 podGC: strategy: OnWorkflowCompletion nodeSelector: agentpool: "argoeph" tolerations: - key: workload-type operator: Equal value: argo effect: NoSchedule ```

Server Service Account

```yaml apiVersion: v1 kind: ServiceAccount metadata: name: argo-server namespace: argo ```

Server ClusterRoleBinding

```yaml apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: name: argo-server-binding roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: argo-server-cluster-role subjects: - kind: ServiceAccount name: argo-server namespace: argo ```

Server ClusterRole

```yaml apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata: name: argo-server-cluster-role rules: - apiGroups: - "" resources: - configmaps verbs: - get - watch - list - apiGroups: - "" resources: - secrets verbs: - get - create - apiGroups: - "" resources: - pods - pods/exec - pods/log verbs: - get - list - watch - delete - apiGroups: - "" resources: - events verbs: - watch - create - patch - apiGroups: - "" resources: - serviceaccounts verbs: - get - list - watch - apiGroups: - argoproj.io resources: - eventsources - sensors - workflows - workfloweventbindings - workflowtemplates - cronworkflows - clusterworkflowtemplates verbs: - create - get - list - watch - update - patch - delete ```

Additional Info

In a Slack convo Anton suggested that https://github.com/argoproj/argo-workflows/pull/12573 may be suspect.

Link to slack message thread

People engaged already:

Happy to provide any more information on request or connect to experiment/work through the issue

Version

v3.5.5

Paste a small workflow that reproduces the issue. We must be able to run the workflow; don't enter a workflows that uses private images.

This isn't an issue with workflow execution.

Logs from the workflow controller

This isn't an issue with workflow execution.

Logs from in your workflow's wait container

This isn't an issue with workflow execution.
agilgur5 commented 6 months ago

Thanks for the detailed report!

In a Slack convo Anton suggested that #12573 may be suspect.

Specifically, the only SSO change in v3.5.5 was #12318, which is unrelated in this case (different provider, different claim). So my suspicion would then be that a deps change somehow impacted this, and #12573 upgraded expr which is used for rbac-rule evaluation here and is the part with the error:

[...] failed to evaluate rule: unknown name groups (1:43)\n | 'ac1ef805-ac6a-4ce9-854a-1cb406aa7121' in groups\n | ..........................................^"

@Freddybob4244 could you provide a few more details that I had asked for in my last message:

Admin users can navigate normally.

Also can you provide the admin SA with its rbac-rule as well? I'm curious how it differs since it applies correctly.

Happy to provide any more information on request or connect to experiment/work through the issue

From the same message, could you try some of these options and check what the logs say:

I'm not sure if we have a robust test environment for SSO issues; we do have an optional Dex but I'm not sure if it's configured and used for SSO tests (I don't think it is IIRC). That would be great to have for reproducibility and to add test cases for the pieces of SSO that are not provider specific.

agilgur5 commented 6 months ago

From DMs:

So I am back from my long weekend - turns out that v3.4.17 has the same problem. I checked the release and it seems to have the expr change. I pushed v3.4.16 (which doesn't have that change) and it works.

That seems to confirm that #12573 is the source of the issue here, now we need to figure out why it's causing this exactly and how to fix that (and if it requires another upstream fix in expr).

Also that expr upgrade seems to have been erroneously backported to Argo v3.4.17 per https://github.com/argoproj/argo-workflows/pull/13043#discussion_r1610560201

agilgur5 commented 6 months ago

Confirmed with a different user in another Slack thread that this is a regression in 3.5.5 and that reverting to 3.5.4 fixed it

agilgur5 commented 6 months ago

@isubasinghe any chance you could take a look at this? Related to the expr upgrade and changes you made in #12573. It's a P1 given that it breaks SSO RBAC

antonmedv commented 5 months ago

Let's fix the error. From what I can see the code is:

'ac1ef805-ac6a-4ce9-854a-1cb406aa7121' in groups

Which seems legit.

antonmedv commented 5 months ago

Can you show an example expression with an error?

tooptoop4 commented 5 months ago

@antonmedv error="failed to evaluate rule: unknown name groups (1:43)\n | 'ac1ef805-ac6a-4ce9-854a-1cb406aa7121' in groups\n | ..........................................^"

https://github.com/argoproj/argo-workflows/pull/12573/files#diff-ace0e536d9a8878e3c778026bd37f97d15bd3d47f889ad972704a2a2ba11878a was changed at same time as the version upgrade. it is called from https://github.com/argoproj/argo-workflows/blob/06da23e8660f7841592070b1a372ace0750c2364/server/auth/gatekeeper.go#L264

antonmedv commented 5 months ago

The error is

unknown name groups

Which means there is no groups in env.

tooptoop4 commented 5 months ago

i tried below in go playground (https://go.dev/play/) but could not reproduce the error:

package main

import (
    "encoding/json"
    "fmt"

    "github.com/antonmedv/expr"

    "gopkg.in/square/go-jose.v2/jwt"
)

type CustomClaims struct {
    Issuer    string           `json:"iss,omitempty"`
    Subject   string           `json:"sub,omitempty"`
    Audience  jwt.Audience     `json:"aud,omitempty"`
    Expiry    *jwt.NumericDate `json:"exp,omitempty"`
    NotBefore *jwt.NumericDate `json:"nbf,omitempty"`
    IssuedAt  *jwt.NumericDate `json:"iat,omitempty"`
    ID        string           `json:"jti,omitempty"`
    Groups    []string         `json:"groups,omitempty"`
}

func Jsonify(v interface{}) (map[string]interface{}, error) {
    data, err := json.Marshal(v)
    if err != nil {
        return nil, err
    }
    x := make(map[string]interface{})
    return x, json.Unmarshal(data, &x)
}

func main() {
    //claims
    //data, err := json.Marshal(claims)
    //      if err != nil {
    //          panic(err)
    //      }
    //      v := make(map[string]interface{}) //env
    //      err = json.Unmarshal(data, &v)
    //      if err != nil {
    //          panic(err)
    //      }

    //env := map[string]interface{}{
    //  "groups": []string{"abc", "ac1ef805-ac6a-4ce9-854a-1cb406aa7121"},
    //}

    //type Claims struct {
    //  jwt.Claims
    //      Groups                  []string               `json:"groups,omitempty"`
    //      Email                   string                 `json:"email,omitempty"`
    //      EmailVerified           bool                   `json:"-"`
    //      Name                    string                 `json:"name,omitempty"`
    ///     ServiceAccountName      string                 `json:"service_account_name,omitempty"`
    //      ServiceAccountNamespace string                 `json:"service_account_namespace,omitempty"`
    //      PreferredUsername       string                 `json:"preferred_username,omitempty"`
    //      RawClaim                map[string]interface{} `json:"-"`
    //  }

    //claims := Claims{
    //  Claims: jwt.Claims{
    //      // Initialize fields of jwt.Claims as needed
    //  },
    //  Groups: []string{"abc", "ac1ef805-ac6a-4ce9-854a-1cb406aa7121"},
    //}

    type Claims struct {
        CustomClaims
        Email                   string                 `json:"email,omitempty"`
        EmailVerified           bool                   `json:"-"`
        Name                    string                 `json:"name,omitempty"`
        ServiceAccountName      string                 `json:"service_account_name,omitempty"`
        ServiceAccountNamespace string                 `json:"service_account_namespace,omitempty"`
        PreferredUsername       string                 `json:"preferred_username,omitempty"`
        RawClaim                map[string]interface{} `json:"-"`
    }

    claims := Claims{
        CustomClaims: CustomClaims{
            Groups: []string{"abc", "ac1ef805-ac6a-4ce9-854a-1cb406aa7121"},
        },
    }

    //claims := Claims{Claims: jwt.Claims{"Groups": []string{"abc", "ac1ef805-ac6a-4ce9-854a-1cb406aa7121"}}}
    v, err := Jsonify(claims)
    if err != nil {
        panic(err)
    }
    fmt.Println(v)

    input := `'ac1ef805-ac6a-4ce9-854a-1cb406aa7121' in groups
    ` //rule
    result, err := expr.Eval(input, v)
    if err != nil {
        panic(err)
    }

    fmt.Println(result)
}
-- go.mod --
module play.ground

require github.com/antonmedv/expr v1.15.5
package main

import (
    "encoding/json"
    "fmt"

    "github.com/expr-lang/expr"

    "gopkg.in/square/go-jose.v2/jwt"
)

type CustomClaims struct {
    Issuer    string           `json:"iss,omitempty"`
    Subject   string           `json:"sub,omitempty"`
    Audience  jwt.Audience     `json:"aud,omitempty"`
    Expiry    *jwt.NumericDate `json:"exp,omitempty"`
    NotBefore *jwt.NumericDate `json:"nbf,omitempty"`
    IssuedAt  *jwt.NumericDate `json:"iat,omitempty"`
    ID        string           `json:"jti,omitempty"`
    Groups    []string         `json:"groups,omitempty"`
}

func Jsonify(v interface{}) (map[string]interface{}, error) {
    data, err := json.Marshal(v)
    if err != nil {
        return nil, err
    }
    x := make(map[string]interface{})
    return x, json.Unmarshal(data, &x)
}

func main() {

    //type Claims struct {
    //  jwt.Claims
    //      Groups                  []string               `json:"groups,omitempty"`
    //      Email                   string                 `json:"email,omitempty"`
    //      EmailVerified           bool                   `json:"-"`
    //      Name                    string                 `json:"name,omitempty"`
    ///     ServiceAccountName      string                 `json:"service_account_name,omitempty"`
    //      ServiceAccountNamespace string                 `json:"service_account_namespace,omitempty"`
    //      PreferredUsername       string                 `json:"preferred_username,omitempty"`
    //      RawClaim                map[string]interface{} `json:"-"`
    //  }

    //claims := Claims{
    //  Claims: jwt.Claims{
    //      // Initialize fields of jwt.Claims as needed
    //  },
    //  Groups: []string{"abc", "ac1ef805-ac6a-4ce9-854a-1cb406aa7121"},
    //}

    type Claims struct {
        CustomClaims
        Email                   string                 `json:"email,omitempty"`
        EmailVerified           bool                   `json:"-"`
        Name                    string                 `json:"name,omitempty"`
        ServiceAccountName      string                 `json:"service_account_name,omitempty"`
        ServiceAccountNamespace string                 `json:"service_account_namespace,omitempty"`
        PreferredUsername       string                 `json:"preferred_username,omitempty"`
        RawClaim                map[string]interface{} `json:"-"`
    }

    claims := Claims{
        CustomClaims: CustomClaims{
            Groups: []string{"abc", "ac1ef805-ac6a-4ce9-854a-1cb406aa7121"},
        },
    }

    //claims := Claims{Claims: jwt.Claims{Groups: []string{"abc", "ac1ef805-ac6a-4ce9-854a-1cb406aa7121"}}}
    v, err := Jsonify(claims)
    if err != nil {
        panic(err)
    }

    input := `'ac1ef805-ac6a-4ce9-854a-1cb406aa7121' in groups
    ` //rule
    program, err := expr.Compile(input, expr.Env(v))
    if err != nil {
        panic(err)
    }
    result, err := expr.Run(program, v)
    if err != nil {
        panic(err)
    }

    fmt.Println(result)
}
-- go.mod --
module play.ground

require github.com/expr-lang/expr v1.16.0
agilgur5 commented 5 months ago

i tried below in go playground (https://go.dev/play/) but could not reproduce the error:

🤔 There were two separate users who encountered it though, and both resolved by downgrading. That would suggest that something in the diff between your code and Argo's is the buggy piece

Also @tooptoop4 there's a "Share" button on the playground that's very helpful 😅

agilgur5 commented 5 months ago

My intuition is that the Eval -> Compile + Run change is responsible somehow, as I think you suspect too

agilgur5 commented 5 months ago

Is it when groups is empty maybe? The Jsonify (that naming is confusing as it's still very much a struct) is perhaps excluding it from the JSON and resulting map?

That would add up since so far the reports have only been from users who were just starting to add SSO RBAC, and not from existing users of that feature (and would explain why more people haven't upvoted this as well)

agilgur5 commented 5 months ago

Can confirm in the playground, changing it to an empty array in the Compile + Run variant causes the error:

            Groups: []string{},
panic: unknown name groups (1:43)
 | 'ac1ef805-ac6a-4ce9-854a-1cb406aa7121' in groups
 | ..........................................^

goroutine 1 [running]:
main.main()
    /tmp/sandbox3921589064/prog.go:80 +0x18e

There is no error, just prints false in the Eval variant:

map[]
false

The Eval variant works the same in expr 1.15.5 and 1.16.0, so it's definitely the Compile + Run change that break this for empty arrays. Also tried the Compile + Run variant in 1.15.5 and that has the same error there.

Thanks @tooptoop4 for making a small repro in the playground for this to make it easier to investigate!

agilgur5 commented 5 months ago

~Seems to still happen even when I remove the Jsonify and use claims directly. So either the Compile statement or the expr.Env is causing the empty string to be excluded~ EDIT: Actually, I think that fails because of capitalization and/or lack of expr struct tag (which would have the lowercase variant similar to the json struct tag)

agilgur5 commented 5 months ago

Removing the omitempty in the json struct tag works. That may have other side-effects though 🤔 There might be a better way around that 🤔

agilgur5 commented 4 months ago

Yea omitempty should technically be removed for all of the fields in that case, which would indeed have side-effects. This may very well affect other places that use templating as well. The better option would be to use expr struct tags and potentially pass in empty structs.

@antonmedv would it make sense for expr to interpret json struct tags? I.e. in the case a json struct tag is present and no expr struct tag is present, use the json struct tag for naming.

antonmedv commented 4 months ago

I think json tag can be used as expr tag via config like this:

expr.FieldTag("json")

What do you think?

ianmuge commented 4 months ago

I tried debugging this and it seems the groups isnt part of the claims at all: you get the equivalent of:

 map[email:name@example.com exp:1.720242915e+09 iss:argo-server name:Facy Name preferred_username:name@example.com  sub:34343434]
agilgur5 commented 4 months ago

I already root caused it above, and yes it will happen when you have no groups.

ianmuge commented 4 months ago

In this case I believed I passed in groups from the IDP

ianmuge commented 4 months ago

Sorry my bad, in my case i was missing the preceding '/' in the userInfoPath resulting in a 404 which was not being raised as an error, returning nil. If I am reading this correctly: https://github.com/argoproj/argo-workflows/blob/main/server/auth/types/claims.go#L102

Thanks for the patience though

agilgur5 commented 4 months ago

userInfoPath resulting in a 404 which was not being raised as an error, returning nil. If I am reading this correctly: https://github.com/argoproj/argo-workflows/blob/main/server/auth/types/claims.go#L102

That's correct for that function, but it's used in HandleCallback which is supposed to return a 401 in that case 🤔 I wonder if it used your old cookie or something? The Server logs are really helpful when debugging SSO since there's a lot of redirects, login -> provider -> callback -> login -> home (or other page). But it sounds like you got it figured out

Freddybob4244 commented 3 months ago

I was able to resolve this on my side by explicitly declaring a group and ensuring that group was present in the claim sent back through our sso provider.

I think either it should be updated to where when no groups match default role is applied, as the documentation suggests would happen - or the documentation updated to call out that the claim must match at least one group to the user even for the default role to apply.

thunder-spb commented 1 month ago

I was able to resolve this on my side by explicitly declaring a group and ensuring that group was present in the claim sent back through our sso provider.

I think either it should be updated to where when no groups match default role is applied, as the documentation suggests would happen - or the documentation updated to call out that the claim must match at least one group to the user even for the default role to apply.

Totally agree. It will more convenient to fall back to default role... This might be optional, like fallback or forbid log in with the corresponding message.