OpenUnison / openunison-k8s

Access portal for Kubernetes
Apache License 2.0
92 stars 5 forks source link

Orechestra pods started giving error " ERROR K8sSessionStore - Could not search k8s" and url started giving Tremelo Error #83

Closed shnigam2 closed 6 months ago

shnigam2 commented 11 months ago

Error Details which we observed in orchestra pod logs are as below :


[2023-06-30 06:03:01,087][Thread-8] ERROR K8sWatcher - Could not run watch, waiting 10 seconds
java.lang.NullPointerException: null
[2023-06-30 06:03:01,091][Thread-15] WARN  OpenShiftTarget - Unexpected result calling 'https://172.20.0.1:443/apis/openunison.tremolo.io/v1/namespaces/openunison/oidc-sessions/x3eca7c10-9267-4dba-b15f-7feca5cd6b28x' - 401 / {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"Unauthorized","reason":"Unauthorized","code":401}

[2023-06-30 06:03:01,104][Thread-15] WARN  OpenShiftTarget - Unexpected result calling 'https://172.20.0.1:443/apis/openunison.tremolo.io/v1/namespaces/openunison/oidc-sessions/xbcd3b484-a5ef-4c24-bc1f-2c4a9fd24123x' - 401 / {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"Unauthorized","reason":"Unauthorized","code":401}

[2023-06-30 06:03:01,113][Thread-15] WARN  OpenShiftTarget - Unexpected result calling 'https://172.20.0.1:443/apis/openunison.tremolo.io/v1/namespaces/openunison/oidc-sessions/xa821d927-c905-4691-9a33-ba69b300edb2x' - 401 / {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"Unauthorized","reason":"Unauthorized","code":401}

[2023-06-30 06:03:01,122][Thread-15] WARN  OpenShiftTarget - Unexpected result calling 'https://172.20.0.1:443/apis/openunison.tremolo.io/v1/namespaces/openunison/oidc-sessions/xcdb7de08-d2ed-4a54-b00d-e2e80fdde73fx' - 401 / {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"Unauthorized","reason":"Unauthorized","code":401}

[2023-06-30 06:03:01,135][Thread-15] WARN  OpenShiftTarget - Unexpected result calling 'https://172.20.0.1:443/apis/openunison.tremolo.io/v1/namespaces/openunison/oidc-sessions/x42afb20e-a193-4a40-90ab-1b586cd220bfx' - 401 / {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"Unauthorized","reason":"Unauthorized","code":401}

[2023-06-30 06:03:01,148][Thread-15] WARN  OpenShiftTarget - Unexpected result calling 'https://172.20.0.1:443/apis/openunison.tremolo.io/v1/namespaces/openunison/oidc-sessions/x33b35ec8-7dd5-4ca6-b80e-213c610f9034x' - 401 / {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"Unauthorized","reason":"Unauthorized","code":401}

[2023-06-30 06:03:01,160][Thread-15] WARN  OpenShiftTarget - Unexpected result calling 'https://172.20.0.1:443/apis/openunison.tremolo.io/v1/namespaces/openunison/oidc-sessions/x782dab93-41c8-4f94-a9ce-61c4a4062a55x' - 401 / {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"Unauthorized","reason":"Unauthorized","code":401}

[2023-06-30 06:03:01,172][Thread-15] WARN  OpenShiftTarget - Unexpected result calling 'https://172.20.0.1:443/apis/openunison.tremolo.io/v1/namespaces/openunison/oidc-sessions/x1ae33343-476c-4077-927f-c8a2151f56b7x' - 401 / {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"Unauthorized","reason":"Unauthorized","code":401}

[2023-06-30 06:03:01,173][Thread-15] WARN  SessionManagerImpl - Clearing 7 sessions
[2023-06-30 06:03:01,960][Thread-9] INFO  K8sWatcher - watching https://172.20.0.1:443/apis/openunison.tremolo.io/v1/namespaces/openunison/orgs?watch=true&timeoutSecond=25
[2023-06-30 06:03:01,972][Thread-9] ERROR K8sWatcher - Could not run watch, waiting 10 seconds
java.lang.NullPointerException: null
[2023-06-30 06:03:03,162][XNIO-1 task-4] INFO  AccessLog - [AzSuccess] - CheckAlive - https://127.0.0.1:8443/check_alive - uid=Anonymous,o=Tremolo - 
         [127.0.0.1] - [f7b58659eceeaa2589cc99c52a5aefe5417d809fa]
[2023-06-30 06:03:03,178][XNIO-1 task-4] INFO  AccessLog - [AzSuccess] - k8sIdp - https://127.0.0.1:8443/auth/idp/k8sIdp/.well-known/openid-configuration - uid=Anonymous,o=Tremolo - NONE [127.0.0.1] - [f9d91ca5e7b85abf8b18de00820e76cbe0023929b]
[2023-06-30 06:03:04,037][Thread-18] INFO  K8sWatcher - watching https://172.20.0.1:443/apis/openunison.tremolo.io/v1/namespaces/openunison/trusts?watch=true&timeoutSecond=25
[2023-06-30 06:03:04,063][Thread-18] ERROR K8sWatcher - Could not run watch, waiting 10 seconds
java.lang.NullPointerException: null
[2023-06-30 06:03:04,175][Thread-14] INFO  K8sWatcher - watching https://172.20.0.1:443/apis/openunison.tremolo.io/v1/namespaces/openunison/authmechs?watch=true&timeoutSecond=25
[2023-06-30 06:03:04,188][Thread-14] ERROR K8sWatcher - Could not run watch, waiting 10 seconds
java.lang.NullPointerException: null
shnigam2 commented 11 months ago

Tremelo Error screenshot attached ,which we are getting when these errors are started coming and for fixing it we just restart openunison object by doing dummy annotation. Looking for the cause and how can we permanently fix it

Screenshot 2023-07-27 at 2 20 37 PM
shnigam2 commented 11 months ago

@mlbiam Hi Marc, Could you please have a view on this.. This is getting frequent now a days..

mlbiam commented 11 months ago

Please provide:

  1. OpenUnison version - You can get this from the beginning of the logs
  2. Kubernetes version and platform (ie eks, kubeadm, etc)
  3. Your values.yaml
  4. Installation method
shnigam2 commented 9 months ago

@mlbiam Please find the main error observed on orchestra pods logs :-

[2023-10-13 22:08:00,011][local_Worker-3] ERROR K8sSessionStore - Could not search k8s
java.lang.NullPointerException: null
    at com.tremolosecurity.oidc.k8s.K8sSessionStore.cleanOldSessions(K8sSessionStore.java:284) [unison-applications-k8s-1.0.24.jar:?]
    at com.tremolosecurity.idp.providers.OpenIDConnectIdP.clearExpiredSessions(OpenIDConnectIdP.java:2215) [unison-idp-openidconnect-1.0.24.jar:?]
    at com.tremolosecurity.idp.providers.oidc.model.jobs.ClearSessions.execute(ClearSessions.java:47) [unison-idp-openidconnect-1.0.24.jar:?]
    at com.tremolosecurity.provisioning.scheduler.UnisonJob.execute(UnisonJob.java:57) [unison-sdk-1.0.24.jar:?]
    at org.quartz.core.JobRunShell.run(JobRunShell.java:202) [quartz-2.3.2.jar:?]
    at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:573) [quartz-2.3.2.jar:?]
[2023-10-13 22:08:00,011][local_Worker-3] ERROR OpenIDConnectIdP - Could not clear sessions
java.lang.Exception: Error searching kubernetes
    at com.tremolosecurity.oidc.k8s.K8sSessionStore.cleanOldSessions(K8sSessionStore.java:302) ~[unison-applications-k8s-1.0.24.jar:?]
    at com.tremolosecurity.idp.providers.OpenIDConnectIdP.clearExpiredSessions(OpenIDConnectIdP.java:2215) [unison-idp-openidconnect-1.0.24.jar:?]
    at com.tremolosecurity.idp.providers.oidc.model.jobs.ClearSessions.execute(ClearSessions.java:47) [unison-idp-openidconnect-1.0.24.jar:?]
    at com.tremolosecurity.provisioning.scheduler.UnisonJob.execute(UnisonJob.java:57) [unison-sdk-1.0.24.jar:?]
    at org.quartz.core.JobRunShell.run(JobRunShell.java:202) [quartz-2.3.2.jar:?]
    at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:573) [quartz-2.3.2.jar:?]
Caused by: java.lang.NullPointerException
    at com.tremolosecurity.oidc.k8s.K8sSessionStore.cleanOldSessions(K8sSessionStore.java:284) ~[unison-applications-k8s-1.0.24.jar:?]
    ... 5 more
[2023-10-13 22:08:01,237][Thread-11] INFO  K8sWatcher - watching https://172.20.0.1:443/apis/openunison.tremolo.io/v1/namespaces/openunison/resultgroups?watch=true&timeoutSecond=25
[2023-10-13 22:08:01,252][Thread-11] ERROR K8sWatcher - Could not run watch, waiting 10 seconds
java.lang.NullPointerException: null
shnigam2 commented 9 months ago

OpenUnison version

k logs openunison-orchestra-d7d9dc9fd-c7dgb   -n openunison|grep -i version
[2023-10-14 01:26:32,137][main] INFO  xnio - XNIO version 3.8.4.Final
[2023-10-14 01:26:32,246][main] INFO  nio - XNIO NIO Implementation Version 3.8.4.Final
  Version: V3
[2023-10-14 01:26:44,747][main] INFO  StdSchedulerFactory - Quartz scheduler version: 2.3.2
[2023-10-14 01:26:57,641][main] INFO  threads - JBoss Threads version 2.3.3.Final

Kubernetes version and platform (ie eks, kubeadm, etc) - Kubeadm & EKS 1.24.10 Your values.yaml:-

     source:
        repoURL: https://nexus.tremolo.io/repository/helm
        targetRevision: 2.3.34
        chart: orchestra-login-portal-argocd
        helm:
          releaseName: openunison
          values: |
            image: our-repo-cngccp-docker-k8s.jfrog.io/openunison-k8s:sdfedgtjhkrghkghdft
 
            operator:
              image: our-repo-cngccp-docker-k8s.jfrog.io/openunison-kubernetes-operator:dfghktyirtyritg
              validators: []
              mutators: []
 
            network:
              openunison_host: "login-{{metadata.annotations.name}}-{{metadata.annotations.environment}}-{{metadata.annotations.region}}-aws.cf.platform.our-repo.cloud"
              dashboard_host: "dashboard-{{metadata.annotations.name}}-{{metadata.annotations.environment}}-{{metadata.annotations.region}}-aws.cf.platform.our-repo.cloud"
              api_server_host: "ou-api-{{metadata.annotations.name}}-{{metadata.annotations.environment}}-{{metadata.annotations.region}}-aws.cf.platform.our-repo.cloud"
              session_inactivity_timeout_seconds: 36000
              k8s_url: ''
              force_redirect_to_tls: false
              createIngressCertificate: false
              ingress_type: none
              ingress_certificate: ou-tls-main-certificate
              ingress_annotations:
                certmanager.k8s.io/cluster-issuer: letsencrypt
                kubernetes.io/ingress.class: nginx
 
            cert_template:
              ou: "Kubernetes"
              o: "MyOrg"
              l: "My Cluster"
              st: "State of Cluster"
              c: "MyCountry"
 
            myvd_config_path: "WEB-INF/myvd.conf"
            k8s_cluster_name: "{{metadata.annotations.name}}-{{metadata.annotations.environment}}-{{metadata.annotations.region}}-aws.cf.platform.our-repo.cloud"
            enable_impersonation: true
            cert_update_image: our-repo-cngccp-docker-k8s.jfrog.io/openunison-kubernetes-operator:dfghktyirtyritg
 
            impersonation:
              jetstack_oidc_proxy_image: our-repo-cngccp-docker-k8s.jfrog.io/kube-oidc-proxy:eretrtyi45864856834e
              use_jetstack: true
              explicit_certificate_trust: true
              ca_secret_name: ou-tls-certificate
 
            dashboard:
              namespace: "kubernetes-dashboard"
              cert_name: "kubernetes-dashboard-certs"
              label: "k8s-app=kubernetes-dashboard"
              service_name: kubernetes-dashboard
              require_session: true
 
            certs:
              use_k8s_cm: false
 
            trusted_certs: []
 
            monitoring:
              prometheus_service_account: system:serviceaccount:monitoring:prometheus-k8s
 
            oidc:
              client_id: "{{metadata.annotations.oidc_client_id}}"
              issuer: https://e52416c3-mckid-us.okta.com
              user_in_idtoken: false
              domain: ""
              scopes: openid email profile groups
              claims:
                sub: sub
                email: email
                given_name: given_name
                family_name: family_name
                display_name: name
                groups: groups
 
            network_policies:
              enabled: false
              ingress:
                enabled: false
              monitoring:
                enabled: false
              apiserver:
                enabled: false
 
            services:
              pullSecret: "jfrog-auth"
              enable_tokenrequest: false
              token_request_audience: api
              token_request_expiration_seconds: 14400
              node_selectors: []
              resources:
                limits:
                  cpu: 500m
                  memory: 2050Mi
                requests:
                  cpu: 200m
                  memory: 1024Mi
 
            openunison:
              replicas: 2
              non_secret_data:
                K8S_DB_SSO: oidc
                PROMETHEUS_SERVICE_ACCOUNT: system:serviceaccount:monitoring:prometheus-k8s
              secrets: []
              html:
                image: our-repo-cngccp-docker-k8s.jfrog.io/openunison-k8s-html:d38662bde4ea41efab695a41cb4fc9766a39ece99f9ea4fed2a9ffac7670c0a2
                prefix: openunison
              enable_provisioning: false
source:
        repoURL: https://nexus.tremolo.io/repository/helm/
        targetRevision: 1.0.24
        chart: openunison-k8s-login-oidc
        helm:
          releaseName: orchestra
          values: |
            deployment_data:
              pull_secret: jfrog-auth
            enable_impersonation: true
            image: "our-repo-cngccp-docker-k8s.jfrog.io/openunison-k8s-login-oidc:sderfdserfd"
            impersonation:
              ca_secret_name: ou-tls-certificate
              explicit_certificate_trust: true
              jetstack_oidc_proxy_image: our-repo-cngccp-docker-k8s.jfrog.io/kube-oidc-proxy:swedfrty
              use_jetstack: true
            k8s_cluster_name: "{{metadata.annotations.name}}-{{metadata.annotations.environment}}-{{metadata.annotations.region}}-aws.cf.platform.our-repo.cloud"
            myvd_configmap: ''
            network:
              api_server_host: "ou-api-{{metadata.annotations.name}}-{{metadata.annotations.environment}}-{{metadata.annotations.region}}-aws.cf.platform.our-repo.cloud"
              createIngressCertificate: false
              dashboard_host: "dashboard-{{metadata.annotations.name}}-{{metadata.annotations.environment}}-{{metadata.annotations.region}}-aws.cf.platform.our-repo.cloud"
              ingress_annotations:
                certmanager.k8s.io/cluster-issuer: letsencrypt
                kubernetes.io/ingress.class: openunison
              k8s_url: ''
              openunison_host: "login-{{metadata.annotations.name}}-{{metadata.annotations.environment}}-{{metadata.annotations.region}}-aws.cf.platform.our-repo.cloud"
              session_inactivity_timeout_seconds: 36000
              ingress_type: none
            oidc:
              auth_url: https://ourorg-orgid-us.okta.com/oauth2/v1/authorize
              client_id: "{{metadata.annotations.oidc_client_id}}"
              token_url: https://ourorg-orgid-us.okta.com/oauth2/v1/token
              user_in_idtoken: false
              userinfo_url: https://ourorg-orgid-us.okta.com/oauth2/v1/userinfo
            openunison:
              replicas: 2
            services:
              pullSecret: jfrog-auth
              resources:
                limits:
                  cpu: 500m
                  memory: 2050Mi
                requests:
                  cpu: 200m
                  memory: 1024Mi
              token_request_expiration_seconds: 14400
            trusted_certs: []

Installation method - Helm

shnigam2 commented 9 months ago

@mlbiam Could you please help us on this issue, we are getting such issues frequently on our clusters which is causing unnecessary user noice..

mlbiam commented 9 months ago

I need the version of openunison. If you're not using one of our versioned images, please get it from the first line of the logs. It will look like OpenUnisonOnUndertow - Starting OpenUnison on Undertow 1.0.38-2023072501

The stack trace looks like its associated with an old openunison image. Can you also give the original source images labels? The refrences to your internal jfrog doesn't really give me any information.

Thanks

shnigam2 commented 9 months ago

@mlbiam It is as below :-

OpenUnisonOnUndertow - Starting OpenUnison on Undertow 1.0.24-2021110502
mlbiam commented 8 months ago

That's almost two years old. I doubt it would work with the 2.3.34 orchestra-login-portal-argocd helm chart. Based on the error message I think what's happening is that the token in the container is expiring. It works on restart because you're getting a new token.

Since you're already using a modern chart, I'd suggest using the 1.0.37 version of the container - ghcr.io/openunison/openunison-k8s:1.0.37

shnigam2 commented 8 months ago

Hi @mlbiam there is one correction We are still using old helm and working on testing the upgraded helm charts. Can you please check below and check if we can fix this issue on existing version which we are using

We are using mutiple Application objects for openunison and orchestra. Below are the ArgoCD objects with helm values.

Openunison

apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  annotations:
    argocd.argoproj.io/sync-wave: '-2'
  name: openunison
  namespace: argocd
spec:
  destination:
    namespace: openunison
    server: 'https://xxxxxxxx'
  project: cnt
  source:
    chart: openunison-operator
    helm:
      releaseName: openunison
      values: |-
        {
          "image": "xxxxxxxx/openunison-k8s-operator:xxxxxxxx",
          "services": {
            "pullSecret": "jfrog-auth"
          }
        }
    repoURL: 'https://nexus.tremolo.io/repository/helm/'
    targetRevision: 2.0.6
  syncPolicy:
    automated:
      prune: true
    syncOptions:
      - ApplyOutOfSyncOnly=true

Orchestra

apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: orchestra
  namespace: argocd
spec:
  destination:
    namespace: openunison
    server: 'https://xxxxxxxx'
  project: cnt
  source:
    chart: openunison-k8s-login-oidc
    helm:
      releaseName: orchestra
      values: |-
        {
          "cert_template": {
            "c": "xxxxxxxx",
            "l": "xxxxxxxx",
            "o": "dev",
            "ou": "xxxxxxxx",
            "st": "xxxxxxxx"
          },
          "deployment_data": {
            "pull_secret": "jfrog-auth"
          },
          "enable_impersonation": true,
          "image": "xxxxxxxx/openunison-k8s-login-oidc:xxxxxxxx",
          "impersonation": {
            "ca_secret_name": "xxxxxxxx",
            "explicit_certificate_trust": true,
            "jetstack_oidc_proxy_image": "xxxxxxxx/kube-oidc-proxy:xxxxxxxx",
            "oidc_tls_secret_name": "tls-certificate",
            "use_jetstack": true
          },
          "k8s_cluster_name": "xxxxxxxx",
          "myvd_configmap": "",
          "network": {
            "api_server_host": "dev-ou-api.com",
            "createIngressCertificate": false,
            "dashboard_host": "dev-dashboard.com",
            "ingress_annotations": {
              "certmanager.k8s.io/cluster-issuer": "letsencrypt",
              "kubernetes.io/ingress.class": "openunison"
            },
            "ingress_certificate": "",
            "ingress_type": "none",
            "k8s_url": "",
            "openunison_host": "dev-login.com",
            "session_inactivity_timeout_seconds": xxxxxxxx
          },
          "oidc": {
            "auth_url": "https://xxxxxxxx",
            "client_id": "xxxxxxxx",
            "token_url": "https://xxxxxxxx",
            "user_in_idtoken": xxxxxxxx,
            "userinfo_url": "https://xxxxxxxx"
          },
          "openunison": {
            "replicas": 2
          },
          "services": {
            "pullSecret": "jfrog-auth",
            "resources": {
              "limits": {
                "cpu": "500m",
                "memory": "2048Mi"
              },
              "requests": {
                "cpu": "200m",
                "memory": "1024Mi"
              }
            },
            "token_request_expiration_seconds": xxxxxxxx
          },
          "trusted_certs": [
            {
              "name": "xxxxxxxx",
              "pem_b64": "xxxxxxxx"
            }
          ]
        }
    repoURL: 'https://nexus.tremolo.io/repository/helm/'
    targetRevision: 1.0.24
  syncPolicy:
    automated:
      prune: true
    syncOptions:
      - ApplyOutOfSyncOnly=true

Could you pls take a look at this and share if we can fix this in existing version without upgrading whole helm chart?

Version of Openunison

OpenUnisonOnUndertow - Starting OpenUnison on Undertow 1.0.24-2021110502
mlbiam commented 8 months ago

Could you pls take a look at this and share if we can fix this in existing version without upgrading whole helm chart?

There's no fix for a version that's two years old or charts that are end-of-life. The version of OpenUnison you're using contains a static configuration that is embedded into the container, so there's nothing that can be fixed via helm. The new version is configured via CRDs which provides much more flexibility. Back-porting a fix is a complex process that requires a tremendous amount of QA to reproduce and validate. It's only something we'll do for customers with commercial support contracts.

mlbiam commented 6 months ago

closing due to inactivity