eclipse-che / che

Kubernetes based Cloud Development Environments for Enterprise Teams
http://eclipse.org/che
Eclipse Public License 2.0
6.99k stars 1.19k forks source link

Support for CustomResource `openshift` & `kubernetes` devfile components is not working #22137

Closed cgruver closed 6 months ago

cgruver commented 1 year ago

Describe the bug

I am attempting to create a devfile which will deploy a Kafka cluster and Kafka topics in the workspace along with the other workspace components.

Following the documentation at https://devfile.io/docs/2.2.0/adding-a-kubernetes-or-openshift-component yields unsuccessful results.

This feature appears to have been enabled by: https://github.com/devfile/devworkspace-operator/pull/961

However, variations on a devfile to implement it have failed.

Che version

7.63@latest

Steps to reproduce

Example Devfile #1:

With this devfile, the workspace silently excludes all of the included components... No errors are obvious.

schemaVersion: 2.2.0
attributes:
  controller.devfile.io/storage-type: per-workspace
metadata:
  name: che-test-workspace
components:
- name: dev-tools
  container: 
    image: image-registry.openshift-image-registry.svc:5000/eclipse-che-images/quarkus:latest
    memoryRequest: 1Gi
    memoryLimit: 6Gi
    cpuRequest: 500m
    cpuLimit: 2000m
    mountSources: true
    sourceMapping: /projects
    args:
      - '-f'
      - /dev/null
    command:
      - tail
    env:
    - name: SHELL
      value: "/bin/zsh"
    volumeMounts:
    - name: m2
      path: /home/user/.m2
- name: ubi
  container:
    args:
      - '-f'
      - /dev/null
    command:
      - tail
    image: registry.access.redhat.com/ubi9/ubi-minimal
    memoryLimit: 64M
    mountSources: true
    sourceMapping: /projects
- volume:
    size: 4Gi
  name: projects
- volume:
    size: 2Gi
  name: m2
- name: kafka-cluster
  openshift:
    deployByDefault: true
    inlined: |
      apiVersion: kafka.strimzi.io/v1beta2
      kind: Kafka
      metadata:
        name: che-demo
        labels:
          app: che-demo
      spec:
        kafka:
          config:
            offsets.topic.replication.factor: 1
            transaction.state.log.replication.factor: 1
            transaction.state.log.min.isr: 1
            inter.broker.protocol.version: '3.4'
          version: 3.4.0
          storage:
            size: 1Gi
            deleteClaim: true
            type: persistent-claim
          replicas: 1
          listeners:
            - name: plain
              port: 9092
              type: internal
              tls: false
            - name: tls
              port: 9093
              type: internal
              tls: true
        entityOperator:
          topicOperator: {}
          userOperator: {}
        zookeeper:
          storage:
            deleteClaim: true
            size: 1Gi
            type: persistent-claim
          replicas: 1
- name: kafka-topic
  openshift:
    deployByDefault: true
    inlined: |
      apiVersion: kafka.strimzi.io/v1beta2
      kind: KafkaTopic
      metadata:
        name: che-demo
        labels:
          strimz.io/cluster: che-demo
      spec:
        config:
          retention.ms: 604800000
          segment.bytes: 1073741824
        partitions: 10
        replicas: 1
        topicName: che-demo
commands:
- exec:
    commandLine: "cp /home/user/.kube/config /projects/config"
    component: dev-tools
    group:
      kind: run
    label: Copy Kubeconfig
    workingDir: '/'
  id: copy-kubeconfig

Example Devfile #2:

With this devfile, the workspace deploys with the correct container components, but there is no obvious way to run the apply commands. Further, the apply commands cannot be created with the deploy group as that group does not appear to be implemented. Note: you have to remove the group entries from the apply commands for this example to not throw an error.

schemaVersion: 2.2.0
attributes:
  controller.devfile.io/storage-type: per-workspace
metadata:
  name: che-test-workspace
components:
- name: dev-tools
  container: 
    image: image-registry.openshift-image-registry.svc:5000/eclipse-che-images/quarkus:latest
    memoryRequest: 1Gi
    memoryLimit: 6Gi
    cpuRequest: 500m
    cpuLimit: 2000m
    mountSources: true
    sourceMapping: /projects
    args:
      - '-f'
      - /dev/null
    command:
      - tail
    env:
    - name: SHELL
      value: "/bin/zsh"
    volumeMounts:
    - name: m2
      path: /home/user/.m2
- name: ubi
  container:
    args:
      - '-f'
      - /dev/null
    command:
      - tail
    image: registry.access.redhat.com/ubi9/ubi-minimal
    memoryLimit: 64M
    mountSources: true
    sourceMapping: /projects
- volume:
    size: 4Gi
  name: projects
- volume:
    size: 2Gi
  name: m2
- name: kafka-cluster
  openshift:
    inlined: |
      apiVersion: kafka.strimzi.io/v1beta2
      kind: Kafka
      metadata:
        name: che-demo
        labels:
          app: che-demo
      spec:
        kafka:
          config:
            offsets.topic.replication.factor: 1
            transaction.state.log.replication.factor: 1
            transaction.state.log.min.isr: 1
            inter.broker.protocol.version: '3.4'
          version: 3.4.0
          storage:
            size: 1Gi
            deleteClaim: true
            type: persistent-claim
          replicas: 1
          listeners:
            - name: plain
              port: 9092
              type: internal
              tls: false
            - name: tls
              port: 9093
              type: internal
              tls: true
        entityOperator:
          topicOperator: {}
          userOperator: {}
        zookeeper:
          storage:
            deleteClaim: true
            size: 1Gi
            type: persistent-claim
          replicas: 1
- name: kafka-topic
  openshift:
    inlined: |
      apiVersion: kafka.strimzi.io/v1beta2
      kind: KafkaTopic
      metadata:
        name: che-demo
        labels:
          strimz.io/cluster: che-demo
      spec:
        config:
          retention.ms: 604800000
          segment.bytes: 1073741824
        partitions: 10
        replicas: 1
        topicName: che-demo
commands:
- exec:
    commandLine: "cp /home/user/.kube/config /projects/config"
    component: dev-tools
    group:
      kind: run
    label: Copy Kubeconfig
    workingDir: '/'
  id: copy-kubeconfig
- apply:
    component: kafka-cluster
    group:
      kind: deploy
    label: deploy-kafka-cluster
  id: kafka-cluster
- apply:
    component: kafka-topic
    group:
      kind: deploy
    label: kafka-topic
  id: kafka-topic

Expected behavior

Workspace deployed with Kafka cluster and Topic

Runtime

OpenShift

Screenshots

No response

Installation method

OperatorHub

Environment

macOS

Eclipse Che Logs

No response

Additional context

The Strimzi Operator is installed with cluster scope.

l0rd commented 1 year ago

@amisevsk can you please have a look

amisevsk commented 1 year ago

I'm looking into the first example (with deployByDefault: true). For the second example, I believe it's expected that the editor will provide some way of applying the resources, e.g. via oc apply, as it's an interactive action and not something we can do with the DevWorkspace Operator. Perhaps we need an issue for supporting this in editors?

amisevsk commented 1 year ago

For the first Devfile sample, the dashboard hits a 403 error when attempting to patch the DevWorkspace, but ignores it and does not show it to the user (created issue: https://github.com/eclipse/che/issues/22145)

Attempting to manually apply the same patch as the dashboard gives a more useful message:

Error from server (devworkspace controller serviceaccount does not have permissions 
to manage kind Kafka defined in component kafka-cluster -- an administrator needs 
to grant the devworkspace operator permissions ('*') kafka.strimzi.io/v1beta1, 
Kind=Kafka to use this DevWorkspace): admission webhook "mutate.devworkspace-controller.svc" 
denied the request: devworkspace controller serviceaccount does not have permissions to 
manage kind Kafka defined in component kafka-cluster -- an administrator needs to grant
the devworkspace operator permissions ('*') kafka.strimzi.io/v1beta1, Kind=Kafka to use 
this DevWorkspace

The basic explanation here is that in order to allow the DevWorkspace Operator to manage CRs from this operator, it needs to be granted * permissions on that CR via clusterrole/clusterrolebinding. This is currently required as the operator will be required to create, update, patch, list, watch, etc. the resources in question. We might be able to improve this in a future release (to scope the required permissions down somewhat). I've created issue https://github.com/devfile/devworkspace-operator/issues/1083 to track this.

There may also be an issue with the dashboard (as DWO verifies the user applying the patch has permissions to get/create/update/delete the CR in question) but I cannot verify this at the moment.

@cgruver For this issue in specific, could you try again after creating the following resources on the cluster?

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: devworkspace-controller-admin-kafka
rules:
- apiGroups:
  - kafka.strimzi.io
  resources:
  - kafkas
  - kafkatopics
  verbs:
  - "*"
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  labels:
  name: devworkspace-controller-admin-kafka
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: devworkspace-controller-admin-kafka
subjects:
- kind: ServiceAccount
  name: devworkspace-controller-serviceaccount
  namespace: dw # Or wherever the DevWorkspace Operator is installed
cgruver commented 1 year ago

@amisevsk Yes, I'll apply that RBAC and update here.

amisevsk commented 1 year ago

Testing (very) briefly on OpenShift, I suspect the RBAC will not fix the issue as the user is not granted admin permissions in their namespace:

❯ oc auth can-i create kafkas -n user1-che
no
amisevsk commented 1 year ago

Note: I've updated the clusterrole/clusterrolebinding in the comment above -- I had the incorrect API group for the clusterrole.

Tested on OpenShift with a cluster-admin user and a regular user:

cgruver commented 1 year ago

@amisevsk I had the wrong API version in the CRs. I failed to notice that it had recently updated to v1beta2 so your first error above is legit. That API version does not exist.

cgruver commented 1 year ago

Never mind. I get the same error after the correction:

Error provisioning workspace Kubernetes components: could not process component kafka-cluster: no kind "Kafka" is registered for version "kafka.strimzi.io/v1beta2" in scheme "pkg/runtime/scheme.go:100"
Workspace stopped due to error
cgruver commented 1 year ago

If I grant my user edit permissions to the Che provisioned namespace, then I can successfully create the Kafka resources manually within the workspace.

I'd prefer not to do that thought, because I would like the resources to be managed by the workspace. i.e. shutdown and/or removed when the workspace stops or is deleted.

amisevsk commented 1 year ago

Yeah, the v1beta1 issue was a red herring, the real problem is that DWO doesn't know how to transmit Kafka CRs to the API server. We might need additional handling for custom resources, as this is an issue that will impact any CR on the cluster, not just Kafka.

I think our hands may be tied within the operator here, at least for the time being. I'll try to look into it more when I have some time.

The second flow (devfile no. 2) is still something that should be supported via the editor, though.

cgruver commented 1 year ago

Update:

I validated that the creation of OpenShift resources works if it's something that the service account has permission to create:

- openshift:
    deployByDefault: true
    inlined: |
      kind: ConfigMap
      apiVersion: v1
      metadata:
        name: test-config
      data:
        test: "value"
  name: config-map
l0rd commented 1 year ago

@amisevsk can you suggest a new title / description for the issue on the DW side please?

amisevsk commented 1 year ago

I've updated the title to more precisely define the issue (custom resources are not supported in devfile components). Currently, the problem is that within the controller, we require the golang specs for custom resource objects in order to apply them and cache them within the reconcile loop.

However, standard Kubernetes objects should be supported. @cgruver let me know if this is accurate.

cgruver commented 1 year ago

@amisevsk That is accurate

cgruver commented 1 year ago

@amisevsk @l0rd

Is this still a backlog item? Or do we need some dependent work to enable CRDs in Dev Spaces?

amisevsk commented 1 year ago

@cgruver We're at an impasse on this issue; Go-based Kubernetes operators can only manage resources they "understand" (which basically means the Go structs are included in the project at build time). As a result, supporting arbitrary CR kinds in the operator is not possible using the standard controller-runtime library -- it doesn't know how to compare them, how to apply them, etc.

We may be able to ultimately find a solution that works in general, but it would likely require an entirely different way of dealing with these components. It's technically on the backlog, but near the bottom.

che-bot commented 7 months ago

Issues go stale after 180 days of inactivity. lifecycle/stale issues rot after an additional 7 days of inactivity and eventually close.

Mark the issue as fresh with /remove-lifecycle stale in a new comment.

If this issue is safe to close now please do so.

Moderators: Add lifecycle/frozen label to avoid stale mode.