devfile / devworkspace-operator

Apache License 2.0
59 stars 49 forks source link

Deleted common/per-user PVC causes workspaces to be stuck until timeout #1258

Open AObuchow opened 2 months ago

AObuchow commented 2 months ago

Description

If the common/per-user PVC is deleted when workspaces are using the common/per-user PVC strategy, the common/per-user PVC will get stuck in a terminating state until all the associated workspace pods are terminated. The pods will eventually get terminated due to workspace timeout, though you could delete the devworkspaces to speed up the process.

From my understanding, what's happening is that PVC deletion will trigger a reconcile -> the workspaces getting reconciled will see that the PVC is being terminated -> the workspace will wait for the PVC to be terminated before attempting to re-create a new common/per-user PVC.

The problem is that the PVC will not terminate until the associated workspace pods are terminated. I haven't yet had a chance to verify why the PVC is waiting in this state (maybe due to finalizers? not sure).

How To Reproduce

  1. Create a workspace that uses the per-user PVC strategy:
cat <<'EOF' | kubectl apply -n $NAMESPACE -f - 
kind: DevWorkspace
apiVersion: workspace.devfile.io/v1alpha2
metadata:
  name: dw-per-user
spec:
  started: true
  routingClass: 'basic'
  template:
    attributes:
      controller.devfile.io/storage-type: per-user
    components:
      - name: web-terminal
        container:
          image: quay.io/wto/web-terminal-tooling:next
          memoryRequest: 256Mi
          memoryLimit: 512Mi
          mountSources: true
          command:
           - "tail"
           - "-f"
           - "/dev/null"
EOF
  1. Wait for the workspace to startup
  2. Delete the workspace's associated PVC: kubectl delete PVC claim-devworkspace -n $NAMESPACE
  3. The workspace status' will change to Starting Provisioning storage: Shared PVC is in terminating state when doing a kubectl get dw -n $NAMESPACE
  4. The PVC will be stuck in the Terminating state
  5. The workspace will remain in the starting state until the workspace times out or the workspace is deleted

Expected behavior

Ideally, when the common PVC is deleted, all workspaces using the per-user/common PVC storage strategy should fail. This would permit the PVC termination to complete, and allow workspaces to be restarted, and have the per-user/common PVC re-provisioned.

Additional context

Discovered this while reviewing https://github.com/devfile/devworkspace-operator/pull/1233#issue-comment-box