AzBuilder / terrakube-helm-chart

Helm chart to install Terrakube in any Kubernetes cluster
Apache License 2.0
33 stars 25 forks source link

Issues with API pod starting #146

Closed BenjaminDecreusefond closed 1 month ago

BenjaminDecreusefond commented 1 month ago

Hi !

I've been trying to deploy a terrakube instance for my company behind a corporate proxy (AWS ALB). It seems like every pod are running fine except for executor, registry and API pod. When I take a look at both executor and registry pod logs I can see no error except that when I describe them I get the error (registry container)

  Warning  Unhealthy  15m   kubelet            Startup probe failed: Get "http://XX.XX.XX.XX:8075/actuator/health": dial tcp XX.XX.XX.XX:8075: connect: connection refused

From my understanding it is caused by the fact that no application is running on that port. The same error appears for executor container.

However, for the API container I have the following errors

2024-10-04T13:06:14.561Z  INFO 1 --- [           main] com.amazonaws.http.AmazonHttpClient      : Configuring Proxy. Proxy Host: https://my-proxy/ Proxy Port: 443
2024-10-04T13:06:16.164Z  INFO 1 --- [           main] org.quartz.impl.StdSchedulerFactory      : Using default implementation for ThreadExecutor
2024-10-04T13:06:16.234Z  INFO 1 --- [           main] org.quartz.core.SchedulerSignalerImpl    : Initialized Scheduler Signaller of type: class org.quartz.core.SchedulerSignalerImpl
2024-10-04T13:06:16.235Z  INFO 1 --- [           main] org.quartz.core.QuartzScheduler          : Quartz Scheduler v2.5.0-rc1 created.
2024-10-04T13:06:16.249Z  INFO 1 --- [           main] o.s.s.quartz.LocalDataSourceJobStore     : Using db table-based data access locking (synchronization).
2024-10-04T13:06:16.252Z  INFO 1 --- [           main] o.s.s.quartz.LocalDataSourceJobStore     : JobStoreCMT initialized.
2024-10-04T13:06:16.253Z  INFO 1 --- [           main] org.quartz.core.QuartzScheduler          : Scheduler meta-data: Quartz Scheduler (v2.5.0-rc1) 'schedulerFactoryBean' with instanceId 'terrakube-api-6f499d54f4-fmdz91728047176166'
  Scheduler class: 'org.quartz.core.QuartzScheduler' - running locally.
  NOT STARTED.
  Currently in standby mode.
  Number of jobs executed: 0
  Using thread pool 'org.quartz.simpl.SimpleThreadPool' - with 10 threads.
  Using job-store 'org.springframework.scheduling.quartz.LocalDataSourceJobStore' - which supports persistence. and is clustered.

2024-10-04T13:06:16.253Z  INFO 1 --- [           main] org.quartz.impl.StdSchedulerFactory      : Quartz scheduler 'schedulerFactoryBean' initialized from an externally provided properties instance.
2024-10-04T13:06:16.253Z  INFO 1 --- [           main] org.quartz.impl.StdSchedulerFactory      : Quartz scheduler version: 2.5.0-rc1
2024-10-04T13:06:16.253Z  INFO 1 --- [           main] org.quartz.core.QuartzScheduler          : JobFactory set to: org.terrakube.api.plugin.scheduler.configuration.QuartzAutoConfiguration$AutowireCapableBeanJobFactory@4ae19926
2024-10-04T13:06:16.340Z  INFO 1 --- [           main] org.quartz.core.QuartzScheduler          : Scheduler schedulerFactoryBean_$_terrakube-api-6f499d54f4-fmdz91728047176166 started.
2024-10-04T13:06:16.733Z  INFO 1 --- [           main] a.p.s.j.t.e.e.K8sClientAutoConfiguration : Ephemeral Executor Configuration Image azbuilder/api-server:2.22.0, Namespace: terrakube, NodeSelector: null
2024-10-04T13:06:17.768Z  INFO 1 --- [           main] o.t.a.p.s.inactive.InactiveJobsService   : Setup job to cancelled inactive jobs
2024-10-04T13:06:17.772Z  INFO 1 --- [           main] o.t.a.p.s.inactive.InactiveJobsService   : Create Schedule for inactive jobs: DEFAULT.TerrakubeV2_InactiveJobs_394c1618-3f1c-4a87-9166-bbb3f19ca6b0
2024-10-04T13:06:17.942Z  INFO 1 --- [           main] o.t.a.p.s.inactive.InactiveJobsService   : jobDetail is null false
2024-10-04T13:06:17.943Z  INFO 1 --- [           main] o.t.a.p.s.inactive.InactiveJobsService   : Delete Old Quartz Job for inactive jobs
2024-10-04T13:06:18.151Z  INFO 1 --- [           main] o.t.a.p.s.inactive.InactiveJobsService   : Create Schedule Job Trigger for inactive jobs DEFAULT.TerrakubeV2_InactiveJobs
2024-10-04T13:06:18.342Z  INFO 1 --- [           main] o.t.a.p.scheduler.module.CacheService    : Run module index scan
2024-10-04T13:06:18.343Z  INFO 1 --- [           main] o.t.a.p.scheduler.module.CacheService    : Create Schedule Module Refresh: DEFAULT.TerrakubeV2_ModuleRefresh_6343d22a-069d-4d16-a3f9-5e6512ebc156
2024-10-04T13:06:18.434Z  INFO 1 --- [           main] o.t.a.p.scheduler.module.CacheService    : Disable Old Module Refresh
2024-10-04T13:06:18.453Z  INFO 1 --- [           main] o.t.a.p.scheduler.module.CacheService    : jobDetail is null false
2024-10-04T13:06:18.453Z  INFO 1 --- [           main] o.t.a.p.scheduler.module.CacheService    : Delete Old Quartz Job
2024-10-04T13:06:18.761Z  INFO 1 --- [           main] o.t.a.p.scheduler.module.CacheService    : Reschedule with new frequency
2024-10-04T13:06:18.764Z  INFO 1 --- [           main] o.t.a.p.scheduler.module.CacheService    : Create Schedule Job Trigger DEFAULT.TerrakubeV2_ModuleRefresh
2024-10-04T13:06:20.143Z ERROR 1 --- [ryBean_Worker-2] o.terrakube.api.rs.module.GitTagsCache   : https://github.com/Azure/terraform-azurerm-compute.git: cannot open git-upload-pack
2024-10-04T13:06:20.344Z ERROR 1 --- [ryBean_Worker-2] o.terrakube.api.rs.module.GitTagsCache   : https://github.com/Azure/terraform-azurerm-cosmosdb.git: cannot open git-upload-pack
2024-10-04T13:06:20.346Z ERROR 1 --- [ryBean_Worker-2] o.terrakube.api.rs.module.GitTagsCache   : https://github.com/Azure/terraform-azurerm-database.git: cannot open git-upload-pack

As it say Exposing 1 endpoint beneath base path '/actuator' I tried the connectivity to the endpoint and everything work fine but when I try another endpoint nothing works.

The main issue that I have is when using the UI of terrakube. I managed to get to the log in UI however, when I click on the login button nothing happens and the inspector tools shows the following error

Request URL:
https://my-proxy/dex/.well-known/openid-configuration
Request Method:
GET
Status Code:
401 Unauthorized
Remote Address:
34.250.194.57:443
Referrer Policy:
strict-origin-when-cross-origin

My understanding is that the API as it is not completely started does not provide the endpoint dex/ and therefore when the UI tries to connect to it fails.

As mentioned in the documentation, i tried to add the

  - name: JAVA_TOOL_OPTIONS
    value: "-Dhttps.proxyPort=8080 -Dhttps.proxyHost=https://my-proxy/"

But i still get the error

2024-10-04T13:06:20.143Z ERROR 1 --- [ryBean_Worker-2] o.terrakube.api.rs.module.GitTagsCache   : https://github.com/Azure/terraform-azurerm-compute.git: cannot open git-upload-pack
2024-10-04T13:06:20.344Z ERROR 1 --- [ryBean_Worker-2] o.terrakube.api.rs.module.GitTagsCache   : https://github.com/Azure/terraform-azurerm-cosmosdb.git: cannot open git-upload-pack

I don't know if this it was is preventing the API from properly starting but it's the only error I found when looking at the 3 pods.

Here's my values.yml if it is of any help

## API properties
api:
  #loadSampleData: true
  env:
  - name: SERVICE_BINDING_ROOT
    value: /mnt/platform/bindings
  - name: JAVA_OPTS
    value: "-Xmx512m -XX:MaxMetaspaceSize=1024m"
  - name: JAVA_TOOL_OPTIONS
    value: "-Dhttps.proxyPort=8080 -Dhttps.proxyHost=https://terrakube-api.my-proxy.net/"
  properties:
    databaseType: "H2"
  resources:
    requests:
      memory: "2Gi"
      cpu: "500m"
    limits:
      memory: "4Gi"
      cpu: "1"

executor:
  env:
  - name: SERVICE_BINDING_ROOT
    value: /mnt/platform/bindings
  - name: JAVA_TOOL_OPTIONS
    value: "-Dhttps.proxyPort=8080 -Dhttps.proxyHost=https://terrakube-api.my-proxy.net/"
  podSecurityContext:
    runAsUser: 0
    runAsGroup: 1000
    fsGroup: 2000
  securityContext:
    runAsUser: 0

## Registry properties
registry:
  enabled: true
  replicaCount: "1"
  serviceType: "ClusterIP"
  env:
  - name: SERVICE_BINDING_ROOT
    value: /mnt/platform/bindings
  - name: JAVA_TOOL_OPTIONS
    value: "-Dhttps.proxyPort=8080 -Dhttps.proxyHost=https://terrakube-api.my-proxy.net/"

dex:
  config:
    issuer: https://terrakube-api.my-proxy.net/dex 

    storage:
      type: memory
    web:
      http: 0.0.0.0:5556
      allowedOrigins: ['*']
      skipApprovalScreen: true
    oauth2:
      responseTypes: ["code", "token", "id_token"]

    connectors:
    - type: ldap
      name: OpenLDAP
      id: ldap
      config:
        # The following configurations seem to work with OpenLDAP:
        #
        # 1) Plain LDAP, without TLS:
        host: terrakube-openldap-service:1389
        insecureNoSSL: true
        #
        # 2) LDAPS without certificate validation:
        #host: localhost:636
        #insecureNoSSL: false
        #insecureSkipVerify: true
        #
        # 3) LDAPS with certificate validation:
        #host: YOUR-HOSTNAME:636
        #insecureNoSSL: false
        #insecureSkipVerify: false
        #rootCAData: 'CERT'
        # ...where CERT="$( base64 -w 0 your-cert.crt )"

        # This would normally be a read-only user.
        bindDN: cn=admin,dc=example,dc=org
        bindPW: admin

        usernamePrompt: Email Address

        userSearch:
          baseDN: ou=users,dc=example,dc=org
          filter: "(objectClass=person)"
          username: mail
          # "DN" (case sensitive) is a special attribute name. It indicates that
          # this value should be taken from the entity's DN not an attribute on
          # the entity.
          idAttr: DN
          emailAttr: mail
          nameAttr: cn

        groupSearch:
          baseDN: ou=Groups,dc=example,dc=org
          filter: "(objectClass=groupOfNames)"

          userMatchers:
            # A user is a member of a group when their DN matches
            # the value of a "member" attribute on the group entity.
          - userAttr: DN
            groupAttr: member

          # The group name should be the "cn" value.
          nameAttr: cn

    staticClients:
    - id: keycloak-app
      redirectURIs:
      - 'terrakube.my-proxy.net'
      - '/device/callback'
      name: 'keycloak-app'
      public: true

## Ingress properties
ingress:
  useTls: true
  includeTlsHosts: true
  ui:
    enabled: false
    domain: "terrakube.my-proxy.net"
    path: "/"
    pathType: "Prefix"
    ingressClassName: "alb"
    tlsSecretName: tls-secret-ui-terrakube

  api:
    enabled: false
    domain: "terrakube-api.my-proxy.net"
    path: "/"
    pathType: "Prefix"
    ingressClassName: "alb"
    tlsSecretName: tls-secret-api-terrakube

  registry:
    enabled: false
    domain: "terrakube-reg.my-proxy.net"
    path: "/"
    pathType: "Prefix"
    ingressClassName: "alb"
    tlsSecretName: tls-secret-reg-terrakube

  dex:
    enabled: false
    path: "/dex/"
    pathType: "Prefix"
    ingressClassName: "alb"

Thanks for your help !

alfespa17 commented 1 month ago

The API logs when the startup is completed should look like this, maybe you can put more CPU to the API pod to test, I think it is not starting and there is a timeout 2 or 3 minutes (dont remember the exact value) before restarting the pod

2024-10-04T14:00:29.157Z  INFO 1 --- [           main] o.s.s.quartz.LocalDataSourceJobStore     : JobStoreCMT initialized.
2024-10-04T14:00:29.158Z  INFO 1 --- [           main] org.quartz.core.QuartzScheduler          : Scheduler meta-data: Quartz Scheduler (v2.5.0-rc1) 'schedulerFactoryBean' with instanceId 'terrakube-api-5bc7d76675-cswcp1728050429103'
  Scheduler class: 'org.quartz.core.QuartzScheduler' - running locally.
  NOT STARTED.
  Currently in standby mode.
  Number of jobs executed: 0
  Using thread pool 'org.quartz.simpl.SimpleThreadPool' - with 10 threads.
  Using job-store 'org.springframework.scheduling.quartz.LocalDataSourceJobStore' - which supports persistence. and is clustered.

2024-10-04T14:00:29.160Z  INFO 1 --- [           main] org.quartz.impl.StdSchedulerFactory      : Quartz scheduler 'schedulerFactoryBean' initialized from an externally provided properties instance.
2024-10-04T14:00:29.160Z  INFO 1 --- [           main] org.quartz.impl.StdSchedulerFactory      : Quartz scheduler version: 2.5.0-rc1
2024-10-04T14:00:29.161Z  INFO 1 --- [           main] org.quartz.core.QuartzScheduler          : JobFactory set to: org.terrakube.api.plugin.scheduler.configuration.QuartzAutoConfiguration$AutowireCapableBeanJobFactory@2c3f9229
2024-10-04T14:00:29.219Z  INFO 1 --- [           main] o.s.s.quartz.LocalDataSourceJobStore     : ClusterManager: detected 1 failed or restarted instances.
2024-10-04T14:00:29.219Z  INFO 1 --- [           main] o.s.s.quartz.LocalDataSourceJobStore     : ClusterManager: Scanning for instance "terrakube-api-7f7fd4f7f9-tmdp71727703675799"'s failed in-progress jobs.
2024-10-04T14:00:29.229Z  INFO 1 --- [           main] org.quartz.core.QuartzScheduler          : Scheduler schedulerFactoryBean_$_terrakube-api-5bc7d76675-cswcp1728050429103 started.
2024-10-04T14:00:29.306Z  INFO 1 --- [_MisfireHandler] o.s.s.quartz.LocalDataSourceJobStore     : Handling 2 trigger(s) that missed their scheduled fire-time.
2024-10-04T14:00:29.472Z  INFO 1 --- [           main] a.p.s.j.t.e.e.K8sClientAutoConfiguration : Ephemeral Executor Configuration Image azbuilder/api-server:2.22.0, Namespace: terrakube, NodeSelector: null
2024-10-04T14:00:29.887Z  INFO 1 --- [           main] o.t.a.p.s.inactive.InactiveJobsService   : Setup job to cancelled inactive jobs
2024-10-04T14:00:29.889Z  INFO 1 --- [           main] o.t.a.p.s.inactive.InactiveJobsService   : Create Schedule for inactive jobs: DEFAULT.TerrakubeV2_InactiveJobs_d06f00fb-ab36-48f2-877c-a485bcd3cfef
2024-10-04T14:00:29.920Z  INFO 1 --- [           main] o.t.a.p.s.inactive.InactiveJobsService   : jobDetail is null false
2024-10-04T14:00:29.920Z  INFO 1 --- [           main] o.t.a.p.s.inactive.InactiveJobsService   : Delete Old Quartz Job for inactive jobs
2024-10-04T14:00:29.977Z  INFO 1 --- [           main] o.t.a.p.s.inactive.InactiveJobsService   : Create Schedule Job Trigger for inactive jobs DEFAULT.TerrakubeV2_InactiveJobs
2024-10-04T14:00:30.045Z  INFO 1 --- [           main] o.t.a.p.scheduler.module.CacheService    : Run module index scan
2024-10-04T14:00:30.046Z  INFO 1 --- [           main] o.t.a.p.scheduler.module.CacheService    : Create Schedule Module Refresh: DEFAULT.TerrakubeV2_ModuleRefresh_e90fe294-8565-4f8a-8be0-002c9ee802b4
2024-10-04T14:00:30.068Z  INFO 1 --- [           main] o.t.a.p.scheduler.module.CacheService    : Disable Old Module Refresh
2024-10-04T14:00:30.084Z  INFO 1 --- [           main] o.t.a.p.scheduler.module.CacheService    : jobDetail is null false
2024-10-04T14:00:30.084Z  INFO 1 --- [           main] o.t.a.p.scheduler.module.CacheService    : Delete Old Quartz Job
2024-10-04T14:00:30.155Z  INFO 1 --- [           main] o.t.a.p.scheduler.module.CacheService    : Reschedule with new frequency
2024-10-04T14:00:30.161Z  INFO 1 --- [           main] o.t.a.p.scheduler.module.CacheService    : Create Schedule Job Trigger DEFAULT.TerrakubeV2_ModuleRefresh
2024-10-04T14:00:31.533Z  INFO 1 --- [           main] o.s.b.a.e.web.EndpointLinksResolver      : Exposing 1 endpoint beneath base path '/actuator'
2024-10-04T14:00:31.670Z  INFO 1 --- [           main] o.t.a.p.s.a.dex.DexWebSecurityAdapter    : Loading CORS https://terrakube-ui.minikube.net
2024-10-04T14:00:37.346Z  INFO 1 --- [           main] com.yahoo.elide.Elide                    : Registering serde for type : class java.time.Instant
2024-10-04T14:00:37.382Z  INFO 1 --- [           main] com.yahoo.elide.Elide                    : Registering serde for type : class java.time.OffsetDateTime
2024-10-04T14:00:37.382Z  INFO 1 --- [           main] com.yahoo.elide.Elide                    : Registering serde for type : class java.util.TimeZone
2024-10-04T14:00:37.383Z  INFO 1 --- [           main] com.yahoo.elide.Elide                    : Registering serde for type : class java.net.URL
2024-10-04T14:00:37.384Z  INFO 1 --- [           main] com.yahoo.elide.Elide                    : Registering serde for type : class java.util.Date
2024-10-04T14:00:37.384Z  INFO 1 --- [           main] com.yahoo.elide.Elide                    : Registering serde for type : class java.sql.Date
2024-10-04T14:00:37.384Z  INFO 1 --- [           main] com.yahoo.elide.Elide                    : Registering serde for type : class java.sql.Time
2024-10-04T14:00:37.384Z  INFO 1 --- [           main] com.yahoo.elide.Elide                    : Registering serde for type : class java.sql.Timestamp
2024-10-04T14:00:37.390Z  INFO 1 --- [           main] com.yahoo.elide.Elide                    : Registering serde for type : class com.yahoo.elide.datastores.aggregation.timegrains.Day
2024-10-04T14:00:37.393Z  INFO 1 --- [           main] com.yahoo.elide.Elide                    : Registering serde for type : class com.yahoo.elide.datastores.aggregation.timegrains.Hour
2024-10-04T14:00:37.395Z  INFO 1 --- [           main] com.yahoo.elide.Elide                    : Registering serde for type : class com.yahoo.elide.datastores.aggregation.timegrains.ISOWeek
2024-10-04T14:00:37.398Z  INFO 1 --- [           main] com.yahoo.elide.Elide                    : Registering serde for type : class com.yahoo.elide.datastores.aggregation.timegrains.Minute
2024-10-04T14:00:37.399Z  INFO 1 --- [           main] com.yahoo.elide.Elide                    : Registering serde for type : class com.yahoo.elide.datastores.aggregation.timegrains.Month
2024-10-04T14:00:37.401Z  INFO 1 --- [           main] com.yahoo.elide.Elide                    : Registering serde for type : class com.yahoo.elide.datastores.aggregation.timegrains.Quarter
2024-10-04T14:00:37.403Z  INFO 1 --- [           main] com.yahoo.elide.Elide                    : Registering serde for type : class com.yahoo.elide.datastores.aggregation.timegrains.Second
2024-10-04T14:00:37.418Z  INFO 1 --- [           main] com.yahoo.elide.Elide                    : Registering serde for type : class com.yahoo.elide.datastores.aggregation.timegrains.Time
2024-10-04T14:00:37.420Z  INFO 1 --- [           main] com.yahoo.elide.Elide                    : Registering serde for type : class com.yahoo.elide.datastores.aggregation.timegrains.Week
2024-10-04T14:00:37.421Z  INFO 1 --- [           main] com.yahoo.elide.Elide                    : Registering serde for type : class com.yahoo.elide.datastores.aggregation.timegrains.Year
2024-10-04T14:00:38.149Z  INFO 1 --- [           main] o.s.b.w.embedded.tomcat.TomcatWebServer  : Tomcat started on port 8080 (http) with context path '/'
2024-10-04T14:00:38.460Z  INFO 1 --- [           main] org.terrakube.api.ServerApplication      : Started ServerApplication in 19.136 seconds (process running for 19.639)
2024-10-04T14:00:38.570Z  INFO 1 --- [nio-8080-exec-1] o.a.c.c.C.[Tomcat].[localhost].[/]       : Initializing Spring DispatcherServlet 'dispatcherServlet'
2024-10-04T14:00:38.570Z  INFO 1 --- [nio-8080-exec-1] o.s.web.servlet.DispatcherServlet        : Initializing Servlet 'dispatcherServlet'
2024-10-04T14:00:38.572Z  INFO 1 --- [nio-8080-exec-1] o.s.web.servlet.DispatcherServlet        : Completed initialization in 2 ms

For this part it looks like you loaded the sample data, and the API is trying to refresh the github tags for this repo https://github.com/Azure/terraform-azurerm-compute but not able to connect to gihub, maybe it is related to your proxy settings or something in your network connectivity

2024-10-04T13:06:20.143Z ERROR 1 --- [ryBean_Worker-2] o.terrakube.api.rs.module.GitTagsCache   : https://github.com/Azure/terraform-azurerm-compute.git: cannot open git-upload-pack

I think for AWS the service type should be NodePort not ClusterApi so your helm values should be (I dont use AWS but in the past I remember I saw you need to use NodePort in some place)

api:
  serviceType: "NodePort"

registry:
  serviceType: "NodePort"

executor:
  serviceType: "NodePort"

ui:
  serviceType: "NodePort"

Dex is an independent component so you should be able to see all this pods when you install terrakube with the default values

user@pop-os:~/git/terrakube-helm-chart$ kubectl get pods -n terrakube
NAME                                  READY   STATUS    RESTARTS   AGE
terrakube-api-5bc7d76675-cswcp        1/1     Running   0          9m59s
terrakube-dex-5c9c55ff4d-s675v        1/1     Running   0          17m
terrakube-executor-64bb4dbd8d-wxxf2   1/1     Running   0          17m
terrakube-minio-575b9db646-m6t72      1/1     Running   0          17m
terrakube-openldap-767798bd9d-jhqz5   1/1     Running   0          17m
terrakube-postgresql-0                1/1     Running   0          17m
terrakube-redis-master-0              1/1     Running   0          17m
terrakube-registry-7b6579df9b-64c85   1/1     Running   0          17m
terrakube-ui-6b5b59c48c-vzbcp         1/1     Running   0          17m

You should review your dex values, inside the connectors your mentioned openldap but inside the statics clients you have keycloak, you can read more about dex configuration here

You also need to enable the ingress setting here, you have false and it should be true:

ingress:
  ui:
    enabled: true

  api:
    enabled: true

  registry:
    enabled: true

  dex:
    enabled: true

I hope that can help

BenjaminDecreusefond commented 1 month ago

Thanks for your quick answer !

I tried to give more power to my API pod but I still get the same error unfortunately. I manage to get all the pods in Running state like you but it's still impossible to go further than than the login page :( I will try to update my values.yml with the proper keycloak set up and let you know if it gets any better !

Have a good week end!

BenjaminDecreusefond commented 1 month ago

Hi !

I'm sorry to come back to you but I have a question regarding terrakube that I don't really understand !

I managed to make all pods launch and get to the login UI. However, when I click on the login button nothing happens sadly :(

When i take a look at the network tab I'm seeing a 401 on the API endpoint under https://terrakube-api.net/dex/.well-known/openid-configuration. I have no issue to get a 200 on other endpoints like actuator/health.

From my point of view it looks like the endpoint api/dex does not exist ? But I think the issue lies somewhere else. I had the same issue with minikube and I'm getting a little lost :(

I'm taking if you have any thoughts,

Regards ! Benjamin

alfespa17 commented 1 month ago

Hi !

I'm sorry to come back to you but I have a question regarding terrakube that I don't really understand !

I managed to make all pods launch and get to the login UI. However, when I click on the login button nothing happens sadly :(

When i take a look at the network tab I'm seeing a 401 on the API endpoint under https://terrakube-api.net/dex/.well-known/openid-configuration. I have no issue to get a 200 on other endpoints like actuator/health.

From my point of view it looks like the endpoint api/dex does not exist ? But I think the issue lies somewhere else. I had the same issue with minikube and I'm getting a little lost :(

I'm taking if you have any thoughts,

Regards ! Benjamin

Just to clarify, dex and the api are 2 different components.

When we deploy it to terrakube we simply publish two services using the same domain just to simplify a little bit the deployment for testing purposes.

The API is deploy to terrakube-api.minikube.net and we publish DEX to terrakube-api.minikube.net/dex, you can see this in this lines.

https://github.com/AzBuilder/terrakube-helm-chart/blob/efd3585024cd7236cabfd389ed090d3d2add721f/charts/terrakube/templates/ingress-api.yaml#L34

https://github.com/AzBuilder/terrakube-helm-chart/blob/efd3585024cd7236cabfd389ed090d3d2add721f/charts/terrakube/templates/ingress-api.yaml#L46

The above shows one endpoint but exposing 2 different kubernetes services.

I hope this can help you.

BenjaminDecreusefond commented 1 month ago

I better understand thank you ! If I follow you well I should expose my dex service as well to be able to access it though my UI ?

alfespa17 commented 1 month ago

I better understand thank you ! If I follow you well I should expose my dex service as well to be able to access it though my UI ?

@BenjaminDecreusefond you can have terrakube-api.minikube.net and terrakube-dex.minikube.net and it will work as long as you put the correct dex configuration for terrakube to know where the DEX endpoint is exposed

dex:
  enabled: false
  config:
    issuer: terrakube-dex.minikube.net

The above will be like having an external DEX service instead of using the default one.

BenjaminDecreusefond commented 1 month ago

I do agree with you ! It's what I've done (I think ?)

dex:
  enabled: true
  config:
    issuer: https://terrakube-api.minikube.net/dex

    storage:
      type: memory
    web:
      http: 0.0.0.0:5556
      allowedOrigins: ['*']
      skipApprovalScreen: true
    oauth2:
      responseTypes: ["code", "token", "id_token"]

      connectors:
      - type: ldap
        name: OpenLDAP
        id: ldap
        config:
          host: terrakube-openldap-service:389
          insecureNoSSL: true

          bindDN: cn=admin,dc=example,dc=org
          bindPW: admin
          usernamePrompt: Email Address
          userSearch:
            baseDN: ou=People,dc=example,dc=org
            filter: "(objectClass=person)"
            username: mail

            idAttr: DN
            emailAttr: mail
            nameAttr: cn
          groupSearch:
            baseDN: ou=Groups,dc=example,dc=org
            filter: "(objectClass=groupOfNames)"
            userMatchers:
              # A user is a member of a group when their DN matches
              # the value of a "member" attribute on the group entity.
            - userAttr: DN
              groupAttr: member
            # The group name should be the "cn" value.
            nameAttr: cn
      staticClients:
      - id: example-app
        redirectURIs:
        - 'https://terrakube.minikube.net'
        - '/device/callback'
        - 'http://localhost:10000/login'
        - 'http://localhost:10001/login'
        name: 'example-app'
        public: true

I kept the default config for dex but when I click on the login button it throws 401 on https://terrakube-api.minikube.net/dex/.well-known/openid-configuration

alfespa17 commented 1 month ago

I do agree with you ! It's what I've done (I think ?)

dex:
  enabled: true
  config:
    issuer: https://terrakube-api.minikube.net/dex

    storage:
      type: memory
    web:
      http: 0.0.0.0:5556
      allowedOrigins: ['*']
      skipApprovalScreen: true
    oauth2:
      responseTypes: ["code", "token", "id_token"]

      connectors:
      - type: ldap
        name: OpenLDAP
        id: ldap
        config:
          host: terrakube-openldap-service:389
          insecureNoSSL: true

          bindDN: cn=admin,dc=example,dc=org
          bindPW: admin
          usernamePrompt: Email Address
          userSearch:
            baseDN: ou=People,dc=example,dc=org
            filter: "(objectClass=person)"
            username: mail

            idAttr: DN
            emailAttr: mail
            nameAttr: cn
          groupSearch:
            baseDN: ou=Groups,dc=example,dc=org
            filter: "(objectClass=groupOfNames)"
            userMatchers:
              # A user is a member of a group when their DN matches
              # the value of a "member" attribute on the group entity.
            - userAttr: DN
              groupAttr: member
            # The group name should be the "cn" value.
            nameAttr: cn
      staticClients:
      - id: example-app
        redirectURIs:
        - 'https://terrakube.minikube.net'
        - '/device/callback'
        - 'http://localhost:10000/login'
        - 'http://localhost:10001/login'
        name: 'example-app'
        public: true

I kept the default config for dex but when I click on the login button it throws 401 on https://terrakube-api.minikube.net/dex/.well-known/openid-configuration

Can you share the complete yaml? (without sensitive data)

BenjaminDecreusefond commented 1 month ago

Sure ! there you go !

## API properties
api:
  env:
  - name: SERVICE_BINDING_ROOT
    value: /mnt/platform/bindings
  - name: JAVA_OPTS
    value: "-Xmx512m -XX:MaxMetaspaceSize=1024m"
  properties:
    databaseType: "H2"
  resources:
    requests:
      memory: "4Gi"
      cpu: "2"

executor:
  env:
  - name: SERVICE_BINDING_ROOT
    value: /mnt/platform/bindings
  podSecurityContext:
    runAsUser: 0
    runAsGroup: 1000
    fsGroup: 2000
  securityContext:
    runAsUser: 0

## Registry properties
registry:
  enabled: true
  replicaCount: "1"
  env:
  - name: SERVICE_BINDING_ROOT
    value: /mnt/platform/bindings

security:
  dexClientId: ${openid_user}
  useOpenLDAP: true

dex:
  enabled: true
  config:
    issuer: https://terrakube-api.minikube.net/dex

    storage:
      type: memory
    web:
      http: 0.0.0.0:5556
      allowedOrigins: ['*']
      skipApprovalScreen: true
    oauth2:
      responseTypes: ["code", "token", "id_token"]

      connectors:
      - type: ldap
        name: OpenLDAP
        id: ldap
        config:
          host: terrakube-openldap-service:389
          insecureNoSSL: true

          bindDN: cn=admin,dc=example,dc=org
          bindPW: admin
          usernamePrompt: Email Address
          userSearch:
            baseDN: ou=People,dc=example,dc=org
            filter: "(objectClass=person)"
            username: mail

            idAttr: DN
            emailAttr: mail
            nameAttr: cn
          groupSearch:
            baseDN: ou=Groups,dc=example,dc=org
            filter: "(objectClass=groupOfNames)"
            userMatchers:
              # A user is a member of a group when their DN matches
              # the value of a "member" attribute on the group entity.
            - userAttr: DN
              groupAttr: member
            # The group name should be the "cn" value.
            nameAttr: cn
      staticClients:
      - id: example-app
        redirectURIs:
        - 'https://terrakube.minikube.net'
        - '/device/callback'
        - 'http://localhost:10000/login'
        - 'http://localhost:10001/login'
        name: 'example-app'
        public: true

## Ingress properties
ingress:
  useTls: true
  includeTlsHosts: true

  ui:
    enabled: true
    domain: "terrakube.minikube.net"
    path: "/"
    pathType: "Prefix"
    ingressClassName: "alb"
    tlsSecretName: tls-secret-ui-terrakube

  api:
    enabled: true
    domain: "terrakube-api.minikube.net"
    path: "/"
    pathType: "Prefix"
    ingressClassName: "alb"
    tlsSecretName: tls-secret-api-terrakube

  registry:
    enabled: true
    domain: "terrakube-reg.minikube.net"
    path: "/"
    pathType: "Prefix"
    ingressClassName: "alb"
    tlsSecretName: tls-secret-reg-terrakube

  dex:
    enabled: true
    path: "/dex/"
    pathType: "Prefix"
    ingressClassName: "alb"

From my previous example I switched back to ldap for simplicity also my Terrakube is running on EKS cluster behind an AWS ALB. We perform https onto the ALB and then the request are redirected to different service using HTTP. It just feels like the /dex endpoint is not exposed through the dex service which is a little bit strange.

Regards !

alfespa17 commented 1 month ago

@BenjaminDecreusefond lets go step by step.

For the API configuration I saw you have SERVICE_BINDING_ROOT, I guess you want to add some internal certificate so it should look like this:

security:
  caCerts:
    terrakubeDemo1.pem: |
      -----BEGIN CERTIFICATE-----

      CERTIFICATE DATA

      -----END CERTIFICATE-----
    terrakubeDemo2.pem: |
      -----BEGIN CERTIFICATE-----

      CERTIFICATE DATA

      -----END CERTIFICATE-----

## API properties
api:
  env:
  - name: SERVICE_BINDING_ROOT
    value: /mnt/platform/bindings
  - name: JAVA_OPTS
    value: "-Xmx512m -XX:MaxMetaspaceSize=1024m"
  properties:
    databaseType: "H2"
  resources:
    requests:
      memory: "4Gi"
      cpu: "2"
  volumes:
    - name: ca-certs
      secret:
        secretName: terrakube-ca-secrets
        items:
        - key: "terrakubeDemo1.pem"
          path: "terrakubeDemo1.pem"
        - key: "terrakubeDemo2.pem"
          path: "terrakubeDemo2.pem"
        - key: "type"
          path: "type"
  volumeMounts:
  - name: ca-certs
    mountPath: /mnt/platform/bindings/ca-certificates
    readOnly: true

The same should happen with the Registry and Executor configuration if you want to add a custom CA certificate.

executor:
  env:
  - name: SERVICE_BINDING_ROOT
    value: /mnt/platform/bindings
  podSecurityContext:
    runAsUser: 0
    runAsGroup: 1000
    fsGroup: 2000
  securityContext:
    runAsUser: 0
  volumes:
    - name: ca-certs
      secret:
        secretName: terrakube-ca-secrets
        items:
        - key: "terrakubeDemo1.pem"
          path: "terrakubeDemo1.pem"
        - key: "terrakubeDemo2.pem"
          path: "terrakubeDemo2.pem"
        - key: "type"
          path: "type"
  volumeMounts:
  - name: ca-certs
    mountPath: /mnt/platform/bindings/ca-certificates
    readOnly: true

## Registry properties
registry:
  enabled: true
  replicaCount: "1"
  env:
  - name: SERVICE_BINDING_ROOT
    value: /mnt/platform/bindings
  volumes:
    - name: ca-certs
      secret:
        secretName: terrakube-ca-secrets
        items:
        - key: "terrakubeDemo1.pem"
          path: "terrakubeDemo1.pem"
        - key: "terrakubeDemo2.pem"
          path: "terrakubeDemo2.pem"
        - key: "type"
          path: "type"
  volumeMounts:
  - name: ca-certs
    mountPath: /mnt/platform/bindings/ca-certificates
    readOnly: true

I dont work with AWS but I saw in another issue that you need to use ImplementationSpecific in some comments here

## Ingress properties
ingress:
  useTls: true
  includeTlsHosts: true

  ui:
    enabled: true
    domain: "terrakube.YOUR-CUSTOM-DOMAIN.net"
    path: "/*"
    pathType: "ImplementationSpecific"
    ingressClassName: "alb"
    tlsSecretName: tls-secret-ui-terrakube

  api:
    enabled: true
    domain: "terrakube-api.YOUR-CUSTOM-DOMAIN.net"
    path: "/*"
    pathType: "ImplementationSpecific"
    ingressClassName: "alb"
    tlsSecretName: tls-secret-api-terrakube

  registry:
    enabled: true
    domain: "terrakube-reg.YOUR-CUSTOM-DOMAIN.net"
    path: "/*"
    pathType: "ImplementationSpecific"
    ingressClassName: "alb"
    tlsSecretName: tls-secret-reg-terrakube

  dex:
    enabled: true
    path: "/dex/"
    pathType: "ImplementationSpecific"
    ingressClassName: "alb"

Not sure if you are using the openldap sample data but from what I can see you need to update the values to use your custom domain.

You could check dex documentation to learn more about the connectors.

security:
  dexClientId: example-app # From the static client configuration below
  useOpenLDAP: true

dex:
  enabled: true
  config:
    issuer: https://terrakube-api.YOUR-CUSTOM-DOMAIN.net/dex

    storage:
      type: memory
    web:
      http: 0.0.0.0:5556
      allowedOrigins: ['*']
      skipApprovalScreen: true
    oauth2:
      responseTypes: ["code", "token", "id_token"]

      connectors:
      - type: ldap
        name: OpenLDAP
        id: ldap
        config:
          host: terrakube-openldap-service:389
          insecureNoSSL: true

          bindDN: cn=admin,dc=example,dc=org
          bindPW: admin
          usernamePrompt: Email Address
          userSearch:
            baseDN: ou=People,dc=example,dc=org
            filter: "(objectClass=person)"
            username: mail

            idAttr: DN
            emailAttr: mail
            nameAttr: cn
          groupSearch:
            baseDN: ou=Groups,dc=example,dc=org
            filter: "(objectClass=groupOfNames)"
            userMatchers:
              # A user is a member of a group when their DN matches
              # the value of a "member" attribute on the group entity.
            - userAttr: DN
              groupAttr: member
            # The group name should be the "cn" value.
            nameAttr: cn
      staticClients:
      - id: example-app
        redirectURIs:
        - 'https://terrakube.YOUR-CUSTOM-DOMAIN.net'
        - '/device/callback'
        - 'http://localhost:10000/login'
        - 'http://localhost:10001/login'
        name: 'example-app'
        public: true

For your issue with the dex endopint you can check this file to see how the helm chart is generating the ingress configuration and validate if maybe you need to do some updates.

https://github.com/AzBuilder/terrakube-helm-chart/blob/main/charts/terrakube/templates/ingress-api.yaml

I hope this can help.

BenjaminDecreusefond commented 1 month ago

Actually, after looking at comment on #93 I managed to get through the login page ! I'm just struggling to one last detail is that it tells me that the user admin@example.com does not exists ! I'm investing it !

alfespa17 commented 1 month ago

Actually, after looking at comment on #93 I managed to get through the login page ! I'm just struggling to one last detail is that it tells me that the user admin@example.com does not exists ! I'm investing it !

admin@example.com is inside the openldap sample data here

BenjaminDecreusefond commented 1 month ago

I have one last issue with the API, does it sound like a misconfiguration setting to you ?

2024-10-08T13:57:43.158Z ERROR 1 --- [nio-8080-exec-1] o.a.c.c.C.[.[.[/].[dispatcherServlet] : Servlet.service() for servlet [dispatcherServlet] in context with path [] threw exception

java.lang.IllegalArgumentException: Unable to resolve the Configuration with the provided Issuer of "https://terrakube-api.MY-DOMAIN/dex"

alfespa17 commented 1 month ago

I have one last issue with the API, does it sound like a misconfiguration setting to you ?

2024-10-08T13:57:43.158Z ERROR 1 --- [nio-8080-exec-1] o.a.c.c.C.[.[.[/].[dispatcherServlet] : Servlet.service() for servlet [dispatcherServlet] in context with path [] threw exception

java.lang.IllegalArgumentException: Unable to resolve the Configuration with the provided Issuer of "https://terrakube-api.MY-DOMAIN/dex"

You need to use a real DNS not terrakube-api.my-domain

BenjaminDecreusefond commented 1 month ago

What do you mean ?

I created a domain terrakube-api.my-company for the API should i create another one ?

alfespa17 commented 1 month ago

What do you mean ?

If you are deploying Terrakube to AWS you will need to use a real DNS name not the sample one "minikube.net", you will need to update your DNS provider and use your own DNS.

BenjaminDecreusefond commented 1 month ago

I created a route53 record already for the api on terrakube-api.mycompany.net so it should work if it tries to connect onto terrakube-api.mycompany.net/dex/ no ?

alfespa17 commented 1 month ago

I created a route53 record already for the api on terrakube-api.mycompany.net so it should work if it tries to connect onto terrakube-api.mycompany.net/dex/ no ?

If that is the case it should work, you will need to validate why the API pod is not able to resolve your route53 record for terrakube-api.mycompany.net

BenjaminDecreusefond commented 1 month ago

Do we agree that the endpoint https://terrakube-api.mycompany.net/api/v1/organization is supposed to hit the API service ? Or am I missing a redirect on my ALB to redirect it on another service ?

alfespa17 commented 1 month ago

Do we agree that the endpoint https://terrakube-api.mycompany.net/api/v1/organization is supposed to hit the API service ?

Yes, that is correct, request from the UI will hit that endpoint, but internally the API is validating the JWT token using the DEX endpoint, that is why you see that message in the API pod

BenjaminDecreusefond commented 1 month ago

Well.. this kind of make me sad :( I'll try to figure out why 🤔

BenjaminDecreusefond commented 1 month ago

Do you know why there is two ports exposed on the Dex service ? And do I need to use both of them or 5556 is enough ?

alfespa17 commented 1 month ago

The default deployment only use 5556 for Dex and 8080 for the API

BenjaminDecreusefond commented 1 month ago

I'm getting really confused, everything seem to be configured properly, I added the record for the registry just in case but nothing changes. I started to wonder if I had to add a security group to allow ingress between different pods of the same SG but it did not work :(

alfespa17 commented 1 month ago

I'm getting really confused, everything seem to be configured properly, I added the record for the registry just in case but nothing changes. I started to wonder if I had to add a security group to allow ingress between different pods of the same SG but it did not work :(

I guess you will need to do some network troubleshooting to validate why the hostname is not getting resolve

BenjaminDecreusefond commented 1 month ago

Finally made it work !! Thank you !

alfespa17 commented 1 month ago

Finally made it work !! Thank you !

If now it is working for you I will close this one, feel free to open a new issue if yo found another problem