devfile / api

Kube-native API for cloud development workspaces specification
Apache License 2.0
237 stars 58 forks source link

Not possible to deploy the devfile registry via CR on OpenShift 4.12 #1092

Closed ibuziuk closed 1 year ago

ibuziuk commented 1 year ago

Which area this feature is related to?

/kind bug

Which area this bug is related to?

/area registry

What versions of software are you using?

latest from main

Go project

Operating System and version:

OpenShift 4.12

Go Pkg Version:

Node.js project

Operating System and version:

Node.js version:

Yarn version:

Project.json:

Web browser

Operating System and version:

Browser name and version:

Bug Summary

Describe the bug:

To Reproduce:

Install registry operator on 4.12 create devfile-registry namespace and create CR in it based on quay.io/devfile/devfile-index:next

ERROR: pod can not be started:

Error creating: pods "devfile-registry-cd79f67d9-" is forbidden: unable to validate against any security context constraint: [provider "anyuid": Forbidden: not usable by user or serviceaccount, provider restricted-v2: .spec.securityContext.fsGroup: Invalid value: []int64{3001}: 3001 is not an allowed group, spec.containers[0].securityContext.runAsUser: Invalid value: 1001: must be in the ranges: [1000600000, 1000609999], spec.containers[1].securityContext.runAsUser: Invalid value: 1001: must be in the ranges: [1000600000, 1000609999], spec.containers[2].securityContext.runAsUser: Invalid value: 1001: must be in the ranges: [1000600000, 1000609999], provider "restricted": Forbidden: not usable by user or serviceaccount, provider "nonroot-v2": Forbidden: not usable by user or serviceaccount, provider "nonroot": Forbidden: not usable by user or serviceaccount, provider "hostmount-anyuid": Forbidden: not usable by user or serviceaccount, provider "hostnetwork-v2": Forbidden: not usable by user or serviceaccount, provider "hostnetwork": Forbidden: not usable by user or serviceaccount, provider "hostaccess": Forbidden: not usable by user or serviceaccount, provider "node-exporter": Forbidden: not usable by user or serviceaccount, provider "privileged": Forbidden: not usable by user or serviceaccount]

Expected behavior

Any logs, error output, screenshots etc? Provide the devfile that sees this bug, if applicable

Additional context

image

Any workaround?

N / A

Suggestion on how to fix the bug

related docs - https://devfile.io/docs/2.2.0/deploying-a-devfile-registry#deploying-a-devfile-registry-with-operator-lifecycle-manager

Target Date: 05-02-2023

michael-valdron commented 1 year ago

We'll use this issue for re-investigating our security context setup for OCP 4.12 support.

@ibuziuk Thank you for addressing this!

michael-valdron commented 1 year ago

After further investigating we have found a few items to address:

  1. Under no circumstances is the default namespace to used for the deployment target. This namespace does not respect any security contexts set and is not recommend for use with OpenShift deployments. To properly address this:
    1. We will provide this requirement in our README instructions and/or our documentation
    2. Add validation in the webhooks to not allow the default namespace to be used for deployment
  2. Our security contexts for the registry operator needs some adjusting to match the setups for our other templates, such as the helm chart. This is due to needing specific user/group definitions to allow a storage enabled volume for OCI registry to work, which is enabled by default for the registry operator. This feature is no longer needed for the devfile registry service so we will be disabling it by default to address this fix then further addressing the retirement of it across all deployment mechanisms in https://github.com/devfile/api/issues/1093.
michael-valdron commented 1 year ago

@ibuziuk In this thread, could you provide the linked issue(s) which are being blocked by this issue.

michael-valdron commented 1 year ago

Storage enabled volume now defaults to false fixing the security context issues: https://github.com/devfile/registry-operator/pull/40

Additional PRs coming in the next sprint for updating the documentation and providing webhook validation.

ibuziuk commented 1 year ago

just for the record, the issue happen in any namespace, not just default e.g. image

could you provide the linked issue(s) which are being blocked by this issue.

well, I put the blocker label since it was not possible to use the operator for deploying registries via CR on 4.11 and 4.12

michael-valdron commented 1 year ago

@ibuziuk

just for the record, the issue happen in any namespace, not just default e.g. image

could you provide the linked issue(s) which are being blocked by this issue.

Thanks for catching and addressing this. We also caught this too during our deployment testing and disabling the storage enabled volume https://github.com/devfile/api/issues/1092#issuecomment-1495075473 seems to have fixed this during additional tests with OCP 4.12 environments.

well, I put the blocker label since it was not possible to use the operator for deploying registries via CR on 4.11 and 4.12

This is understandable that any issue needing to use the registry operator will be blocked by this issue. With the blocker label, it is good to have what issue(s) are being blocked which triggered the creation of the issue in addition to any additional issues that others can report on in the thread, helps us keep track of what is going on.

Side note: Officially, we are only supporting OCP 4.12 at this point. This will added to our requirements in documentation as one of the TODOs for this issue.

ibuziuk commented 1 year ago

thanks, I added some details in the spike we had on the Eclipse Che end - https://github.com/eclipse/che/issues/22075#issuecomment-1496281368

michael-valdron commented 1 year ago

In addition to the work completed last sprint, https://github.com/devfile/api/issues/1092#issuecomment-1495075473, two more tasks remain for this sprint:

  1. Modifying validation webhooks to prevent default namespaces from being used
  2. Document requirements for deploying a registry operator
michael-valdron commented 1 year ago

Work in progress has webhook validation which prints the following error message when trying to deploy a devfile registry to a default namespace:

Error from server (devfile registry deployment namespace should never be 'default'.): error when creating "registry.yaml": admission webhook "vdevfileregistry.kb.io" denied the request: devfile registry deployment namespace should never be 'default'.
kim-tsao commented 1 year ago

Work in progress has webhook validation which prints the following error message when trying to deploy a devfile registry to a default namespace:

Error from server (devfile registry deployment namespace should never be 'default'.): error when creating "registry.yaml": admission webhook "vdevfileregistry.kb.io" denied the request: devfile registry deployment namespace should never be 'default'.
michael-valdron commented 1 year ago

Updating to webhook validation test cases to take these changes into account.

michael-valdron commented 1 year ago

Webhook changes to validate the namespace is not default is now ready for review: https://github.com/devfile/registry-operator/pull/41

Additional PR is being worked on for documentation changes.

kim-tsao commented 1 year ago

part of epic https://github.com/devfile/api/issues/1007

michael-valdron commented 1 year ago

Deployment requirements outline in README.md is now ready for review: devfile/registry-operator#42

michael-valdron commented 1 year ago

Latest changes to the registry operator seems to have fixed the errors shown in this issue while using OCP 4.12 and Kubernetes 1.23.

@ibuziuk Can you confirm if this is fixed on your end?

michael-valdron commented 1 year ago

Additional time is needed for the review on this, target date has been updated. Confirmation of PR fixes and PR revisions to provide correct Kubernetes version requirements are still needed to close this issue.

ibuziuk commented 1 year ago

tested on 4.12 nightly and it seems to work :+1:

image

michael-valdron commented 1 year ago

Now using issue #662 to track documentation additions related to the requirements these changes support.

Since it is confirmed that the changes completed in https://github.com/devfile/registry-operator/pull/40 and https://github.com/devfile/registry-operator/pull/41 have resolved this issue I will be closing it now.