eclipse-archived / codewind

The official repository of the Eclipse Codewind project
https://codewind.dev
Eclipse Public License 2.0
113 stars 45 forks source link

Getting 502 Bad Gateway from Gatekeeper when deployed remotely #1900

Closed bsteinfeld closed 4 years ago

bsteinfeld commented 4 years ago

Codewind version: Built cwctl from master commit: 6038b72 OS: Ubuntu 18.04.3 (based on the theia-full docker image)

Che version: na IDE extension version: na IDE version: na Kubernetes cluster: IBM Kubernetes Service (client v1.17.0, server v1.15.8+IKS)

Description: I'm trying to deploy Codewind on the IBM Cloud Kubernetes service (using cwctl following guide from: https://www.eclipse.org/codewind/remoteconfiguring.html). I've been able to deploy keycloak (after adding a local StorageClass to get around jboss directory permission errors) and the rest of the codewind stack (performance, pfe, gatekeeper) and all their logs look good. Also of note I am passing --ingress which seems to work (keycloak is up and accessible through said ingress). However cwctl fails to complete as it is unable to access the codewind Gatekeeper service - it's returning a 502 Bad Gateway error. image

Steps to reproduce:

  1. Get access to an IKS (IBM Cloud Kubernetes Service)
  2. Deploy theia-full:next to the cluster (https://hub.docker.com/r/theiaide/theia-full)
  3. Connect to theia container (e.g. via kubectl exec)
  4. git pull latest codewind code to container
  5. Install cwctl as defined here: https://github.com/eclipse/codewind/blob/master/start.sh#L35-L57
  6. (to fix JBoss volume permission bug that happens with ibmfs k8s StorageClasses) Install local storage class and add a 10GB PV.
  7. Start keycloak via:
    cwctl --insecure install remote \
    --namespace codewind  \
    --kadminuser keycloak \
    --kadminpass <pwd>  \
    --krealm codewind \
    --kclient codewind  \
    --kdevuser keycloak \
    --kdevpass <pwd> \
    --konly \
    --ingress <someingresspathfromiks>.containers.appdomain.cloud
  8. Start codewind services (get keycloak url from previous step)
    cwctl --insecure install remote \
    --namespace codewind  \
    --kadminuser keycloak \
    --kadminpass <pwd>  \
    --krealm codewind \
    --kclient codewind  \
    --kdevuser keycloak \
    --kdevpass <pwd> \
    --kurl https://codewind-keycloak-<somehash>.<someingresspathfromiks>.containers.appdomain.cloud \
    --ingress <someingresspathfromiks>.containers.appdomain.cloud

Keycloak will now be accessible, however the cwctl command will fail to connect to the Codewind Gatekeeper - it's a 502.

All logs show that all services have started and are running.

For example:

Workaround:

markcor11 commented 4 years ago

/assign @markcor11

markcor11 commented 4 years ago
  1. Do you see the ingress created for the gatekeeper service kubectl get ingress -n <your_namespace>

  2. Are you able to open the gatekeeper page: https://codewind-gatekeeper-<somehash>.<someingresspathfromiks>.containers.appdomain.cloud/health and see an OK ?

  3. From a browser access: https://codewind-gatekeeper-<somehash>.<someingresspathfromiks>.containers.appdomain.cloud/api/pfe/ready and return an OK ?

  4. Does the request appear in the Gatekeeper logs (look for req.originalUrl = /api/pfe/ready)

bsteinfeld commented 4 years ago
  1. Yes I see the ingress:
    $ kubectl get ingress -n codewind
    NAME                           HOSTS                                                                                                                       ADDRESS         PORTS     AGE
    codewind-gatekeeper-k5y9bjox   codewind-gatekeeper-k5y9bjox.labs-dev-tor01-500955-a45631dc5778dc6371c67d206ba9ae5c-0000.tor01.containers.appdomain.cloud   158.85.85.234   80, 443   18h
    codewind-keycloak-k5y9ai1m     codewind-keycloak-k5y9ai1m.labs-dev-tor01-500955-a45631dc5778dc6371c67d206ba9ae5c-0000.tor01.containers.appdomain.cloud     158.85.85.234   80, 443   18h

Moreover here is the description:

Name:             codewind-gatekeeper-k5y9bjox
Namespace:        codewind
Address:          158.85.85.234
Default backend:  default-http-backend:80 (<none>)
TLS:
  secret-codewind-tls-k5y9bjox terminates codewind-gatekeeper-k5y9bjox.labs-dev-tor01-500955-a45631dc5778dc6371c67d206ba9ae5c-0000.tor01.containers.appdomain.cloud
Rules:
  Host                                                                                                                       Path  Backends
  ----                                                                                                                       ----  --------
  codewind-gatekeeper-k5y9bjox.labs-dev-tor01-500955-a45631dc5778dc6371c67d206ba9ae5c-0000.tor01.containers.appdomain.cloud
                                                                                                                             /   codewind-gatekeeper-k5y9bjox:9096 (172.30.31.185:9096)
Annotations:
  nginx.ingress.kubernetes.io/backend-protocol:    HTTPS
  nginx.ingress.kubernetes.io/force-ssl-redirect:  true
  nginx.ingress.kubernetes.io/rewrite-target:      /
  kubernetes.io/ingress.class:                     nginx
Events:                                            <none>
  1. No, I get a 502 Bad Gateway (https://codewind-gatekeeper-k5y9bjox.labs-dev-tor01-500955-a45631dc5778dc6371c67d206ba9ae5c-0000.tor01.containers.appdomain.cloud/health).

  2. Same 502 Bad Gateway

  3. As I didn't connect I see nothing in its logs just:

    
    > gatekeeper@1.0.0 start /usr/src/app
    > node server.js

Gatekeeper k5y9bjox with route authentication and UI Socket pass-through Gatekeeper configuration: PFE Service: https://172.21.224.0:9191 SESSION_SECRET: REALM: codewind CLIENT_ID: codewind-k5y9bjox AUTH_URL: https://codewind-keycloak-k5y9ai1m.labs-dev-tor01-500955-a45631dc5778dc6371c67d206ba9ae5c-0000.tor01.containers.appdomain.cloud CLIENT_SECRET: GATEKEEPER_HOST: CODEWIND-GATEKEEPER-K5Y9BJOX.LABS-DEV-TOR01-500955-A45631DC5778DC6371C67D206BA9AE5C-0000.TOR01.CONTAINERS.APPDOMAIN.CLOUD Access role : codewind-k5y9bjox Added environment route to : /api/v1/gatekeeper/environment Gatekeeper listening on port 9096!

markcor11 commented 4 years ago

I think the annotations in IKS are different to those we are using - I can add them in, but could you please try editing the Gatekeeper ingress and add these two additional annotations :

ingress.bluemix.net/redirect-to-https: "True"
ingress.bluemix.net/ssl-services: ssl-service=codewind-gatekeeper-k5y9bjox
bsteinfeld commented 4 years ago

Thanks @markcor11 .

I applied the ingress rules (kubectl edit) while cwctl was waiting for Codewind Gatekeeper to start and it did connect and complete the installation.

All I've tested [so far] is that https:///health and https:///api/pfe/ready work - they both return OK which I assume is good.

rtaniwa commented 4 years ago

@micgibso once this work closes, please work with @markcor11 to ensure we have all the necessary steps captured in the documentation to support this configuration (see comments above: I've been able to deploy keycloak after adding a local StorageClass to get around jboss directory permission errors). Running CW in this configuration against both IKS and OpenShift on IKS are two key scenarios we want to support now that hybrid is enabled.

bsteinfeld commented 4 years ago

@rtaniwa @micgibso @markcor11 Let me add more details about the issue (thanks that you brought it up again).

First of all, the error occurs during the keycloak installation step of running a cwctl install remote command against an IKS cluster.

Keycloak errors out with the following:

Added 'admin' to '/opt/jboss/keycloak/standalone/configuration/keycloak-add-user.json', restart server to load user
-b 0.0.0.0
=========================================================================

  Using Embedded H2 database

=========================================================================

=========================================================================

  JBoss Bootstrap Environment

  JBOSS_HOME: /opt/jboss/keycloak

  JAVA: java

  JAVA_OPTS:  -server -Xms64m -Xmx512m -XX:MetaspaceSize=96M -XX:MaxMetaspaceSize=256m -Djava.net.preferIPv4Stack=true -Djboss.modules.system.pkgs=org.jboss.byteman -Djava.awt.headless=true  --add-exports=java.base/sun.nio.ch=ALL-UNNAMED --add-exports=jdk.unsupported/sun.misc=ALL-UNNAMED --add-exports=jdk.unsupported/sun.reflect=ALL-UNNAMED

=========================================================================

16:48:07,971 INFO  [org.jboss.modules] (main) JBoss Modules version 1.9.1.Final
java.lang.IllegalStateException: WFLYSRV0126: Could not create server content directory: /opt/jboss/keycloak/standalone/data/content
    at org.jboss.as.server@10.0.0.Final//org.jboss.as.server.ServerEnvironment.<init>(ServerEnvironment.java:482)
    at org.jboss.as.server@10.0.0.Final//org.jboss.as.server.Main.determineEnvironment(Main.java:388)
    at org.jboss.as.server@10.0.0.Final//org.jboss.as.server.Main.main(Main.java:96)
    at org.jboss.modules.Module.run(Module.java:352)
    at org.jboss.modules.Module.run(Module.java:320)
    at org.jboss.modules.Main.main(Main.java:593)
16:48:08,563 FATAL [org.jboss.as.server] (main) WFLYSRV0239: Aborting with exit code 1

This issue is related to the default StorageClasses IKS provides:

NAME                         PROVISIONER                    AGE
ibmc-file-bronze (default)   ibm.io/ibmc-file               8d
ibmc-file-bronze-gid         ibm.io/ibmc-file               8d
ibmc-file-custom             ibm.io/ibmc-file               8d
ibmc-file-gold               ibm.io/ibmc-file               8d
ibmc-file-gold-gid           ibm.io/ibmc-file               8d
ibmc-file-retain-bronze      ibm.io/ibmc-file               8d
ibmc-file-retain-custom      ibm.io/ibmc-file               8d
ibmc-file-retain-gold        ibm.io/ibmc-file               8d
ibmc-file-retain-silver      ibm.io/ibmc-file               8d
ibmc-file-silver             ibm.io/ibmc-file               8d
ibmc-file-silver-gid         ibm.io/ibmc-file               8d

I haven't dug in too deeply, but these StorageClasses mount PVs with more restrictive permissions than expected.

The way I got around this issues was to add a local storage StorageClass:

Name:            local-storage
IsDefaultClass:  Yes
Annotations:     kubectl.kubernetes.io/last-applied-configuration={"apiVersion":"storage.k8s.io/v1","kind":"StorageClass","metadata":{"annotations":{},"name":"local-storage"},"provisioner":"kubernetes.io/no-provisioner","volumeBindingMode":"WaitForFirstConsumer"}
,storageclass.kubernetes.io/is-default-class=true
Provisioner:           kubernetes.io/no-provisioner
Parameters:            <none>
AllowVolumeExpansion:  <unset>
MountOptions:          <none>
ReclaimPolicy:         Delete
VolumeBindingMode:     WaitForFirstConsumer
Events:                <none>

Then by setting this as the default StorageClass and manually creating a PV:

---
  apiVersion: v1
  kind: PersistentVolume
  metadata:
    name: keycloak-local
  spec:
    capacity:
      storage: 10Gi
    accessModes:
      - ReadWriteOnce
    storageClassName: local-storage
    local:
      path: /tmp
    nodeAffinity:
      required:
        nodeSelectorTerms:
        - matchExpressions:
          - key: our.domain/node-purpose
            operator: In
            values:
            - service

We get Keycloak to bind to this PV (which has better permissions).

I do not think this is the correct solution.

I believe a more correct solutions would use an Init Container to fix directory permissions (on IBM ibm.io/ibmc-file storage) before Keycloak pod runs.

johnmcollier commented 4 years ago

@bsteinfeld FWIW, if you want to try it, IBM Block storage should work with Keycloak: See https://cloud.ibm.com/docs/containers?topic=containers-block_storage to install it on IKS

markcor11 commented 4 years ago

PR: Add IKS ingress annotations to cwctl: https://github.com/eclipse/codewind-installer/pull/356

I think once this PR gets merged we can probably close this issue, I'll create a new followup issue for the new IBM Cloud storage problem that effects both IKS and Openshift on IKS.

markcor11 commented 4 years ago

Followup issue for storage problem : https://github.com/eclipse/codewind/issues/1964

tobespc commented 4 years ago

closing as the storage issue #1964 is the next required step