argoproj / argo-helm

ArgoProj Helm Charts
https://argoproj.github.io/argo-helm/
Apache License 2.0
1.7k stars 1.85k forks source link

Bug in Rendering Helm Template, Argo-Workflows Not Respecting My Values File #2502

Closed Richard-Barrett closed 4 months ago

Richard-Barrett commented 7 months ago

Describe the bug

So basically I put everything in this ticket on a prior repository and was directed here:

The main thing is it seems like Helm doesn't actually respect my helm values that I have set up, when I try to render the template:

---
server:
  serviceType: "NodePort"
  authModes:
    - sso
  secure: true
  sso:
    enabled: true
    insecureSkipVerify: false
    clientId:
      key: "client_id"
      name: "argo-server-sso"
    clientSecret:
      key: "client_secret"
      name: "argo-server-sso"
    issuer: "<some_url>"
    redirectUrl: "<some_url>/oauth2/callback"
    sessionExpiry: "1h" #  1 Hour for Session Expiration
  ingress:
    annotations:
      alb.ingress.kubernetes.io/certificate-arn: <some_certificate_arn>
      alb.ingress.kubernetes.io/healthcheck-path: /
      alb.ingress.kubernetes.io/healthcheck-protocol: HTTP
      alb.ingress.kubernetes.io/listen-ports: '[{"HTTPS": 443}]'
      alb.ingress.kubernetes.io/wafv2-acl-arn: <some_waf_arn>
      alb.ingress.kubernetes.io/scheme: internet-facing
      alb.ingress.kubernetes.io/ssl-redirect: "443"
      alb.ingress.kubernetes.io/subnets: <some_subnets>
      alb.ingress.kubernetes.io/tags: Environment=dev,Team=test
      alb.ingress.kubernetes.io/target-type: ip
      external-dns.alpha.kubernetes.io/hostname: <some_url>
    enabled: "true"
    ingressClassName: "alb"
    hosts:
      - <some_url>
    paths:
      - /*
    pathType: ImplementationSpecific
controller:
  logging:
    format: json
  metricsConfig:
    enabled: "true"
    serviceMonitor:
      enabled: "true"
    telemetryConfig:
      enabled: "true"
artifactRepositoryRef:
  artifact-repositories:
    annotations:
      workflows.argoproj.io/default-artifact-repository: default
    default:
      s3:
        bucket: irt-dl-us-argoworkflow-unrestricted-dev
        endpoint: s3.amazonaws.com
        region: us-east-1
        keyFormat: "workflow\
                    /{{workflow.creationTimestamp.Y}}\
                    /{{workflow.creationTimestamp.m}}\
                    /{{workflow.creationTimestamp.d}}\
                    /{{workflow.name}}\
                    /{{pod.name}}"
        useSDKCreds: true

When I run this, it doesn't actually give me what I would expect, especially for the sso:

helm template argo/argo-workflows -f terraform/irt-data-dev/values/argo.values.yaml --version "0.33.1" -n argo-workflows > manifests/dev/argo.yaml

Related helm chart

argo-workflows

Helm chart version

0.33.1

To Reproduce

Just try to use the values file and run the template, you don't need to actually make all of the infra I am specifying in the first ticket.

Expected behavior

I would expect the template to respect my values.

Screenshots

No response

Additional context

No response

agilgur5 commented 7 months ago

Helm chart version

0.33.1

authModes:
    - sso

I only took a quick look (I'm not a maintainer of the Charts), but it looks like you're trying to use the latest values with an old version of the Chart. The values for 0.33.1 are different from the current values.

For instance, authModes was added in #2336, which was released as 0.39.0

Richard-Barrett commented 7 months ago

It doesn't matter which one I do, the SSO Values are not respected. You can try changing the version to 0.39.0 and/or 0.40.0 and it will still not get injected into the server container manifest.

Richard-Barrett commented 7 months ago

Basically here is what happens:

13-Feb-2024,11:15 [release-name-argo-workflows-server-f8ff4c57f-pw6j6] time="2024-02-13T17:15:06.463Z" level=info msg="Generating Self Signed TLS Certificates for Secure Mode"
13-Feb-2024,11:15 [release-name-argo-workflows-server-f8ff4c57f-pw6j6] Error: 403 Forbidden: <html>
13-Feb-2024,11:15 [release-name-argo-workflows-server-f8ff4c57f-pw6j6] <head><title>403 Forbidden</title></head>
13-Feb-2024,11:15 [release-name-argo-workflows-server-f8ff4c57f-pw6j6] <body>
13-Feb-2024,11:15 [release-name-argo-workflows-server-f8ff4c57f-pw6j6] <center><h1>403 Forbidden</h1></center>
13-Feb-2024,11:15 [release-name-argo-workflows-server-f8ff4c57f-pw6j6] </body>
13-Feb-2024,11:15 [release-name-argo-workflows-server-f8ff4c57f-pw6j6] </html>
13-Feb-2024,11:15 [release-name-argo-workflows-server-f8ff4c57f-pw6j6] Usage:
13-Feb-2024,11:15 [release-name-argo-workflows-server-f8ff4c57f-pw6j6]   argo server [flags]
13-Feb-2024,11:15 [release-name-argo-workflows-server-f8ff4c57f-pw6j6] Examples:
13-Feb-2024,11:15 [release-name-argo-workflows-server-f8ff4c57f-pw6j6] See https://argoproj.github.io/argo-workflows/argo-server/
13-Feb-2024,11:15 [release-name-argo-workflows-server-f8ff4c57f-pw6j6] Flags:
13-Feb-2024,11:15 [release-name-argo-workflows-server-f8ff4c57f-pw6j6]       --access-control-allow-origin string   Set Access-Control-Allow-Origin header in HTTP responses.
13-Feb-2024,11:15 [release-name-argo-workflows-server-f8ff4c57f-pw6j6]       --allowed-link-protocol stringArray    Allowed link protocol in configMap. Used if the allowed configMap links protocol are different from http,https. Defaults to the environment variable ALLOWED_LINK_PROTOCOL (default [http,https])
13-Feb-2024,11:15 [release-name-argo-workflows-server-f8ff4c57f-pw6j6]       --api-rate-limit uint                  Set limit per IP for api ratelimiter (default 1000)
13-Feb-2024,11:15 [release-name-argo-workflows-server-f8ff4c57f-pw6j6]       --auth-mode stringArray                API server authentication mode. Any 1 or more length permutation of: client,server,sso (default [client])
13-Feb-2024,11:15 [release-name-argo-workflows-server-f8ff4c57f-pw6j6]       --basehref string                      Value for base href in index.html. Used if the server is running behind reverse proxy under subpath different from /. Defaults to the environment variable BASE_HREF. (default "/")
13-Feb-2024,11:15 [release-name-argo-workflows-server-f8ff4c57f-pw6j6]   -b, --browser                              enable automatic launching of the browser [local mode]
13-Feb-2024,11:15 [release-name-argo-workflows-server-f8ff4c57f-pw6j6]       --configmap string                     Name of K8s configmap to retrieve workflow controller configuration (default "workflow-controller-configmap")
13-Feb-2024,11:15 [release-name-argo-workflows-server-f8ff4c57f-pw6j6]       --event-async-dispatch                 dispatch event async
13-Feb-2024,11:15 [release-name-argo-workflows-server-f8ff4c57f-pw6j6]       --event-operation-queue-size int       how many events operations that can be queued at once (default 16)
13-Feb-2024,11:15 [release-name-argo-workflows-server-f8ff4c57f-pw6j6]       --event-worker-count int               how many event workers to run (default 4)
13-Feb-2024,11:15 [release-name-argo-workflows-server-f8ff4c57f-pw6j6]   -h, --help                                 help for server
13-Feb-2024,11:15 [release-name-argo-workflows-server-f8ff4c57f-pw6j6]       --hsts                                 Whether or not we should add a HTTP Secure Transport Security header. This only has effect if secure is enabled. (default true)
13-Feb-2024,11:15 [release-name-argo-workflows-server-f8ff4c57f-pw6j6]       --kube-api-burst int                   Burst to use while talking with kube-apiserver. (default 30)
13-Feb-2024,11:15 [release-name-argo-workflows-server-f8ff4c57f-pw6j6]       --kube-api-qps float32                 QPS to use while talking with kube-apiserver. (default 20)
13-Feb-2024,11:15 [release-name-argo-workflows-server-f8ff4c57f-pw6j6]       --log-format string                    The formatter to use for logs. One of: text|json (default "text")
13-Feb-2024,11:15 [release-name-argo-workflows-server-f8ff4c57f-pw6j6]       --managed-namespace string             namespace that watches, default to the installation namespace
13-Feb-2024,11:15 [release-name-argo-workflows-server-f8ff4c57f-pw6j6]       --namespaced                           run as namespaced mode
13-Feb-2024,11:15 [release-name-argo-workflows-server-f8ff4c57f-pw6j6]   -p, --port int                             Port to listen on (default 2746)
13-Feb-2024,11:15 [release-name-argo-workflows-server-f8ff4c57f-pw6j6]   -e, --secure                               Whether or not we should listen on TLS. (default true)
13-Feb-2024,11:15 [release-name-argo-workflows-server-f8ff4c57f-pw6j6]       --tls-certificate-secret-name string   The name of a Kubernetes secret that contains the server certificates
13-Feb-2024,11:15 [release-name-argo-workflows-server-f8ff4c57f-pw6j6]       --x-frame-options string               Set X-Frame-Options header in HTTP responses. (default "DENY")
13-Feb-2024,11:15 [release-name-argo-workflows-server-f8ff4c57f-pw6j6] Global Flags:
13-Feb-2024,11:15 [release-name-argo-workflows-server-f8ff4c57f-pw6j6]       --argo-base-href string          An path to use with HTTP client (e.g. due to BASE_HREF). Defaults to the ARGO_BASE_HREF environment variable.
13-Feb-2024,11:15 [release-name-argo-workflows-server-f8ff4c57f-pw6j6]       --argo-http1                     If true, use the HTTP client. Defaults to the ARGO_HTTP1 environment variable.
13-Feb-2024,11:15 [release-name-argo-workflows-server-f8ff4c57f-pw6j6]   -s, --argo-server host:port          API server host:port. e.g. localhost:2746. Defaults to the ARGO_SERVER environment variable.
13-Feb-2024,11:15 [release-name-argo-workflows-server-f8ff4c57f-pw6j6]       --as string                      Username to impersonate for the operation
13-Feb-2024,11:15 [release-name-argo-workflows-server-f8ff4c57f-pw6j6]       --as-group stringArray           Group to impersonate for the operation, this flag can be repeated to specify multiple groups.
13-Feb-2024,11:15 [release-name-argo-workflows-server-f8ff4c57f-pw6j6]       --as-uid string                  UID to impersonate for the operation
13-Feb-2024,11:15 [release-name-argo-workflows-server-f8ff4c57f-pw6j6]       --certificate-authority string   Path to a cert file for the certificate authority
13-Feb-2024,11:15 [release-name-argo-workflows-server-f8ff4c57f-pw6j6]       --client-certificate string      Path to a client certificate file for TLS
13-Feb-2024,11:15 [release-name-argo-workflows-server-f8ff4c57f-pw6j6]       --client-key string              Path to a client key file for TLS
13-Feb-2024,11:15 [release-name-argo-workflows-server-f8ff4c57f-pw6j6]       --cluster string                 The name of the kubeconfig cluster to use
13-Feb-2024,11:15 [release-name-argo-workflows-server-f8ff4c57f-pw6j6]       --context string                 The name of the kubeconfig context to use
13-Feb-2024,11:15 [release-name-argo-workflows-server-f8ff4c57f-pw6j6]       --gloglevel int                  Set the glog logging level
13-Feb-2024,11:15 [release-name-argo-workflows-server-f8ff4c57f-pw6j6]   -H, --header strings                 Sets additional header to all requests made by Argo CLI. (Can be repeated multiple times to add multiple headers, also supports comma separated headers) Used only when either ARGO_HTTP1 or --argo-http1 is set to true.
13-Feb-2024,11:15 [release-name-argo-workflows-server-f8ff4c57f-pw6j6]       --insecure-skip-tls-verify       If true, the server's certificate will not be checked for validity. This will make your HTTPS connections insecure
13-Feb-2024,11:15 [release-name-argo-workflows-server-f8ff4c57f-pw6j6]   -k, --insecure-skip-verify           If true, the Argo Server's certificate will not be checked for validity. This will make your HTTPS connections insecure. Defaults to the ARGO_INSECURE_SKIP_VERIFY environment variable.
13-Feb-2024,11:15 [release-name-argo-workflows-server-f8ff4c57f-pw6j6]       --instanceid string              submit with a specific controller's instance id label. Default to the ARGO_INSTANCEID environment variable.
13-Feb-2024,11:15 [release-name-argo-workflows-server-f8ff4c57f-pw6j6]       --kubeconfig string              Path to a kube config. Only required if out-of-cluster
13-Feb-2024,11:15 [release-name-argo-workflows-server-f8ff4c57f-pw6j6]       --loglevel string                Set the logging level. One of: debug|info|warn|error (default "info")
13-Feb-2024,11:15 [release-name-argo-workflows-server-f8ff4c57f-pw6j6] 403 Forbidden: <html>
13-Feb-2024,11:15 [release-name-argo-workflows-server-f8ff4c57f-pw6j6] <head><title>403 Forbidden</title></head>
13-Feb-2024,11:15 [release-name-argo-workflows-server-f8ff4c57f-pw6j6] <body>
13-Feb-2024,11:15 [release-name-argo-workflows-server-f8ff4c57f-pw6j6] <center><h1>403 Forbidden</h1></center>
13-Feb-2024,11:15 [release-name-argo-workflows-server-f8ff4c57f-pw6j6] </body>
13-Feb-2024,11:15 [release-name-argo-workflows-server-f8ff4c57f-pw6j6] </html>
13-Feb-2024,11:15 [release-name-argo-workflows-server-f8ff4c57f-pw6j6]   -n, --namespace string               If present, the namespace scope for this CLI request
13-Feb-2024,11:15 [release-name-argo-workflows-server-f8ff4c57f-pw6j6]       --password string                Password for basic authentication to the API server
13-Feb-2024,11:15 [release-name-argo-workflows-server-f8ff4c57f-pw6j6]       --proxy-url string               If provided, this URL will be used to connect via proxy
13-Feb-2024,11:15 [release-name-argo-workflows-server-f8ff4c57f-pw6j6]       --request-timeout string         The length of time to wait before giving up on a single server request. Non-zero values should contain a corresponding time unit (e.g. 1s, 2m, 3h). A value of zero means don't timeout requests. (default "0")
13-Feb-2024,11:15 [release-name-argo-workflows-server-f8ff4c57f-pw6j6]       --server string                  The address and port of the Kubernetes API server
13-Feb-2024,11:15 [release-name-argo-workflows-server-f8ff4c57f-pw6j6]       --tls-server-name string         If provided, this name will be used to validate server certificate. If this is not provided, hostname used to contact the server is used.
13-Feb-2024,11:15 [release-name-argo-workflows-server-f8ff4c57f-pw6j6]       --token string                   Bearer token for authentication to the API server
13-Feb-2024,11:15 [release-name-argo-workflows-server-f8ff4c57f-pw6j6]       --user string                    The name of the kubeconfig user to use
13-Feb-2024,11:15 [release-name-argo-workflows-server-f8ff4c57f-pw6j6]       --username string                Username for basic authentication to the API server
13-Feb-2024,11:15 [release-name-argo-workflows-server-f8ff4c57f-pw6j6]   -v, --verbose                        Enabled verbose logging, i.e. --loglevel debug

These are the logs from the Server Container that goes into a CrashLoopBackoff.

Richard-Barrett commented 7 months ago

Furthermore, I have tried secure: false and it still messes up.

Richard-Barrett commented 7 months ago

Also the container args get messed up as well:

spec:
  containers:
    - args:
        - server
        - '--configmap=release-name-argo-workflows-workflow-controller-configmap'
        - '--auth-mode=sso'
        - '--secure=true'
        - '--loglevel'
        - info
        - '--gloglevel'
        - '0'
        - '--log-format'
        - text

I specify the log format to be JSON, and it goes to text (AKA Default)

agilgur5 commented 7 months ago

I specify the log format to be JSON, and it goes to text (AKA Default)

controller:
  logging:
    format: json

In your values above, you only set the format for the Controller, and not the Server. So that would be expected behavior.

Both the --secure and --auth-mode flags are set as well.

So far, in the examples you have given, the templated output does correctly match up with your values.

agilgur5 commented 7 months ago

It doesn't matter which one I do, the SSO Values are not respected. You can try changing the version to 0.39.0 and/or 0.40.0 and it will still not get injected into the server container manifest.

To be clear, the sso values are primarily in the ConfigMap, and not in the container args. That's why the Server has the --configmap reference to be able to pull from it.

Richard-Barrett commented 7 months ago

So then is my values file wrong?

What's weird is when I deploy it to two different environments I get two different results:

Screenshot 2024-02-14 at 11 09 44 AM Screenshot 2024-02-14 at 11 09 54 AM
Richard-Barrett commented 7 months ago

I mean I am deploying the same values file with some different variables for two different clusters that are configured in the same manner, in the past I had this working but for some odd reason this isn't working anymore. But still if there isn't anything wrong with the Helm Values files, why do I have two different results?

agilgur5 commented 7 months ago

What's weird is when I deploy it to two different environments I get two different results:

That would suggest a problem between the configuration of your two environments, no?

But still if there isn't anything wrong with the Helm Values files, why do I have two different results?

Per above, that would suggest it's not the Helm values that are the problem. Otherwise, I do not have access to your environments and cannot debug and root cause for you. Your Helm values are producing the expected result. Based on the examples you have given, the Chart is working as intended.

Whether that is "right" or "wrong" depends on what your initial intent was. For instance, if you wanted to configure the Server to have JSON logging, you were missing that configuration, so that would be "wrong".

These are the logs from the Server Container that goes into a CrashLoopBackoff.

In your upstream issue in Workflows, this is one of the details I asked for. The 403 is in the Server's own logs -- that means the Server is making a request that is getting a 403. If your RBAC is correct and it's accessing your ConfigMap and other k8s resources properly, then it might be your SSO issuer that's giving a 403. I'm not sure, unfortunately the log isn't saying which request is failing. Debug logging might help.

Richard-Barrett commented 7 months ago

I mean, the only thing that really changes are the annotation values, the URL's used for the ingress endpoints, the WAFs, and the subnets, other than that nothing really changes. The way we set up the Okta Application was a oauth application and a bookmark for the redirect tile in Okta upon login.

agilgur5 commented 7 months ago

the URL's used for the ingress endpoints, the WAFs, and the subnets

Those can have impact, as can environmental VPCs, ACLs, SGs, NetworkPolicies, etc. But this does not seem to be a firewall issue, as that would usually result in a dropped connection or timeout. A 403 suggests it made it past the firewall, got to the authenticator, and was subsequently denied access. It gave HTML as well, and not just a response, so that would suggest it is something that serves HTML. I am being careful in wording here as, in all debugging, it's all hypotheses until you know the root cause -- that's just what my knowledge and experience would point to.

Have you tried debug logging, as I mentioned before? That may help reveal which request is failing and get closer to root cause from there. Network introspection can as well (e.g. if you have Hubble or other tools installed). If it's Okta and you have admin access, could check Okta's logs as well.

Richard-Barrett commented 7 months ago

I will try the debug logging this coming week, unfortunately I don't have access to Okta Admin so I will need to reach out to one my organization's administrators to get the Okta Logs.

github-actions[bot] commented 5 months ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.