devfile / devworkspace-operator

Apache License 2.0
67 stars 55 forks source link

feat: set defaults for ignoredUnrecoverableEvents operator config #1310

Closed mkuznyetsov closed 2 months ago

mkuznyetsov commented 3 months ago

What does this PR do?

Add FailedScheduling event to the default list of ignoredUnrecoverableEvents list in operator config.

(this PR is an alternative to https://github.com/devfile/devworkspace-operator/pull/1306)

the relevant docs should also be updated: https://eclipse.dev/che/docs/stable/administration-guide/configuring-machine-autoscaling/#_when_the_autoscaler_adds_a_new_node

What issues does this PR fix or reference?

https://github.com/devfile/devworkspace-operator/issues/1280

Is it tested? How?

create a workspace with exceeding resource requests/limits (modified samples/plain.yaml):

apiVersion: workspace.devfile.io/v1alpha2
metadata:
  name: plain-devworkspace
spec:
  started: true
  routingClass: 'basic'
  template:
    components:
      - name: web-terminal
        container:
          image: quay.io/wto/web-terminal-tooling:next
          memoryRequest: 1000Gi
          memoryLimit: 1000Gi
          mountSources: true
          command:
           - "tail"
           - "-f"
           - "/dev/null"

check the workspace status, which will keep trying to start workspace, until it times out in 5 minutes:

$ kdw get dw
NAME                 DEVWORKSPACE ID             PHASE    INFO
plain-devworkspace   workspace8e15dba59ab04607   Failed   DevWorkspace failed to progress past step 'Waiting for workspace deployment' for longer than timeout (5m). Ignored events: Detected unrecoverable event FailedScheduling: 0/1 nodes are available: 1 Insufficient memory. preemption: 0/1 nodes are available: 1 No preemption victims found for incoming pod...

PR Checklist

openshift-ci[bot] commented 3 months ago

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: mkuznyetsov Once this PR has been reviewed and has the lgtm label, please assign dkwon17 for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files: - **[OWNERS](https://github.com/devfile/devworkspace-operator/blob/main/OWNERS)** Approvers can indicate their approval by writing `/approve` in a comment Approvers can cancel approval by writing `/approve cancel` in a comment