eclipse-archived / codewind

The official repository of the Eclipse Codewind project
https://codewind.dev
Eclipse Public License 2.0
113 stars 45 forks source link

SVT: [ODO][Che] ODO projects remain intermittently bounded to cluster volumes after deleting the workspace #1477

Open sujeilyfonseca opened 4 years ago

sujeilyfonseca commented 4 years ago

Codewind version: 0.7.0 OS: Red Hat Enterprise Linux 7 (64 bit)

Che version: 7.3.1 Kubernetes cluster: OKD/Openshift

Description: I removed my projects and deleted my workspace in Codewind for Eclipse Che (0.7.0), and I still see the ODO projects bounded to my cluster volumes:

[root@sfonseca-okd-codewind ~]# oc get pv
NAME              CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS      CLAIM                                     STORAGECLASS   REASON    AGE
che-data          1Gi        RWO            Recycle          Bound       che/postgres-data                                                  131d
che-workspace     1Gi        RWO            Recycle          Available                                                                      131d
che-workspace1    1Gi        RWO            Recycle          Bound       che/cw-cw-openshift-nodejs-app-s2idata                             131d
che-workspace10   1Gi        RWX            Recycle          Available                                                                      42d
che-workspace2    1Gi        RWO            Recycle          Available                                                                      131d
che-workspace3    1Gi        RWO            Recycle          Available                                                                      131d
che-workspace4    1Gi        RWO            Recycle          Failed      che/cw-odo-perl-app-s2idata                                        131d
che-workspace5    1Gi        RWO            Recycle          Available                                                                      131d
che-workspace6    1Gi        RWO            Recycle          Available                                                                      42d
che-workspace7    1Gi        RWO            Recycle          Available                                                                      42d
che-workspace8    1Gi        RWO            Recycle          Available                                                                      42d
che-workspace9    1Gi        RWX            Recycle          Available                                                                      42d
[root@sfonseca-okd-codewind ~]# oc get pods
NAME                                 READY     STATUS    RESTARTS   AGE
che-69964f4f9f-pfqfz                 1/1       Running   0          1d
che-operator-58b5f759bd-jtz6s        1/1       Running   0          2d
cw-cw-openshift-nodejs-app-1-thnld   1/1       Running   0          1d
devfile-registry-54974f5486-drjcs    1/1       Running   0          1d
keycloak-7858585559-m64rx            1/1       Running   0          2d
plugin-registry-5f8cddb958-5f6dg     1/1       Running   0          1d
postgres-6b46d76688-hnfc4            1/1       Running   0          2d
tiller-deploy-57bcbcbbc8-dxj5g       1/1       Running   0          1d

@jagraj

sujeilyfonseca commented 4 years ago

/priority stopship

GeekArthur commented 4 years ago

@sujeilyfonseca Can you also upload your PFE log here?

sujeilyfonseca commented 4 years ago

I don't have PFE logs since the workspace was also deleted.

GeekArthur commented 4 years ago

Okay, I figure this out, there are two different issues described under this issue: 1 Dangling odo project PVC after project deletion 2 Dangling odo project PVC after workspace deletion

I can reproduce the first issue but can't reproduce the second issue, the second issue should be resolved since we already added owner reference (owner is workspace) to odo project PVC when we create odo project, so kube garbage collector will handle odo project PVC deletion once we delete workspace, you can try again to verify this, if it still doesn't work you can run kubectl edit pvc <dangling odo project pvc> to see if it has owner reference, and if it has then to check owner reference uid to see which owner it belongs to, it's impossible the owner doesn't exist but dependent exists.

For the first issue, the culprit is this PR from odo: https://github.com/openshift/odo/pull/2125 which is merged 15 days ago, currently odo adds the owner reference (owner is odo project deploymentconfig) to odo project PVC, secret, service and route, when we create odo project via Codewind we add the owner reference (owner is worksapce) again to overwrite the original owner reference that added by odo so that results in dangling odo project PVC after project deletion because we rely on odo delete to delete odo project and the owner reference added by odo have been overwritten by us.

I can try to use another way to add owner reference on our side for odo projects to fix this

malincoln commented 4 years ago

@sujeilyfonseca pls verify and close. Thanks

sujeilyfonseca commented 4 years ago

I have tested this problem several times and also asked @DavidG1011 to reproduce the problem. We both notice that it tries to release the pods after removing the workspace, but then they fail and remain bounded to the volumes:

NAME              CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS      CLAIM                                               STORAGECLASS   REASON    AGE
che-data          1Gi        RWO            Recycle          Bound       che/cw-cw-odo-nodejs-app-s2idata                                             133d
che-workspace     1Gi        RWO            Recycle          Bound       che/cw-cw-odo-perl-app-s2idata                                               134d
che-workspace1    1Gi        RWO            Recycle          Bound       che/postgres-data                                                            134d
che-workspace10   1Gi        RWX            Recycle          Bound       codewind/codewind-pfe-pvc-k44jl5a3                                           45d
che-workspace2    1Gi        RWO            Recycle          Bound       che/cw-cw-odo-python-app-s2idata                                             134d
che-workspace3    1Gi        RWO            Recycle          Available                                                                                134d
che-workspace4    1Gi        RWO            Recycle          Bound       che/claim-che-workspace-workspace1m9y02d20d6xt7k8                            134d
che-workspace5    1Gi        RWO            Recycle          Available                                                                                134d
che-workspace6    1Gi        RWO            Recycle          Bound       codewind/codewind-keycloak-pvc-k44jl5a3                                      45d
che-workspace7    1Gi        RWO            Recycle          Available                                                                                45d
che-workspace8    1Gi        RWO            Recycle          Available                                                                                45d
[root@sfonseca-okd-codewind ~]# oc get pv
NAME              CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS      CLAIM                                               STORAGECLASS   REASON    AGE
che-data          1Gi        RWO            Recycle          Failed      che/cw-cw-odo-nodejs-app-s2idata                                             134d
che-workspace     1Gi        RWO            Recycle          Failed      che/cw-cw-odo-perl-app-s2idata                                               134d
che-workspace1    1Gi        RWO            Recycle          Bound       che/postgres-data                                                            134d
che-workspace10   1Gi        RWX            Recycle          Bound       codewind/codewind-pfe-pvc-k44jl5a3                                           45d
che-workspace2    1Gi        RWO            Recycle          Failed      che/cw-cw-odo-python-app-s2idata                                             134d
che-workspace3    1Gi        RWO            Recycle          Available                                                                                134d
che-workspace4    1Gi        RWO            Recycle          Failed      che/claim-che-workspace-workspace1m9y02d20d6xt7k8                            134d
che-workspace5    1Gi        RWO            Recycle          Available                                                                                134d
che-workspace6    1Gi        RWO            Recycle          Bound       codewind/codewind-keycloak-pvc-k44jl5a3                                      45d
che-workspace7    1Gi        RWO            Recycle          Available                                                                                45d
che-workspace8    1Gi        RWO            Recycle          Available                                                                                45d
che-workspace9    1Gi        RWX            Recycle          Available                                                                                45d
[root@sfonseca-okd-codewind ~]# oc get pv
NAME              CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS      CLAIM                                     STORAGECLASS   REASON    AGE
che-data          1Gi        RWO            Recycle          Failed      che/cw-cw-odo-nodejs-app-s2idata                                   134d
che-workspace     1Gi        RWO            Recycle          Available                                                                      134d
che-workspace1    1Gi        RWO            Recycle          Bound       che/postgres-data                                                  134d
che-workspace10   1Gi        RWX            Recycle          Bound       codewind/codewind-pfe-pvc-k44jl5a3                                 45d
che-workspace2    1Gi        RWO            Recycle          Failed      che/cw-cw-odo-python-app-s2idata                                   134d
che-workspace3    1Gi        RWO            Recycle          Available                                                                      134d
che-workspace4    1Gi        RWO            Recycle          Available                                                                      134d
che-workspace5    1Gi        RWO            Recycle          Available                                                                      134d
che-workspace6    1Gi        RWO            Recycle          Bound       codewind/codewind-keycloak-pvc-k44jl5a3                            45d
che-workspace7    1Gi        RWO            Recycle          Available                                                                      45d
che-workspace8    1Gi        RWO            Recycle          Available                                                                      45d
che-workspace9    1Gi        RWX            Recycle          Available                                                                      45d

Steps to reproduce:

  1. Install and start Codewind for Eclipse Che
  2. Create ODO projects
  3. Remove the workspace without deleting the projects
DavidG1011 commented 4 years ago

Some statuses from my 4.2 cluster:

Prior to deletion: image

Post deletion. Seems to get to a released state and then quickly fails.

image image

GeekArthur commented 4 years ago

@sujeilyfonseca @DavidG1011 let us do two things:

  1. To get the SHA of PFE image by running oc get po --selector=app=codewind-pfe -o jsonpath='{.items[*].status.containerStatuses[*].imageID}' to ensure we are testing the same PFE image and we are on the same page
  2. PVC requests PV resource, so if the behavior is PV is trying to release but fail, we need to check the PVC as well, we can run the following commands before and after project deletion: a. kubectl get pvc to get the odo project PVC name b. kubectl describe pvc <PVC name we get before> c. kubectl get pv to get the odo project PV name d. kubectl describe pv <PV name we get before> Note: If there is no PVC after project deletion then we just get the information of PV is fine
GeekArthur commented 4 years ago

Also one more note is currently we rely on odo to delete odo project PVC and reclaim the corresponding PV, but odo doesn't add blockOwnerDeletion: true in the owner reference of its PVC, so we may see workspace is gone but PV is still there (sometimes with failed state), we need to wait for some time the PV will be gone automatically.

malincoln commented 4 years ago

@GeekArthur are we waiting on additional fixes or this is good to close?

GeekArthur commented 4 years ago

@malincoln There is no additional fixes for this issue, let me confirm Sujeily then I can close this issue.

sujeilyfonseca commented 4 years ago

FYI, I have been working to provide @GeekArthur with the information he requested. However, I currently have a problem with the master node of my cluster. I have opened a ticket, and someone is working on it.

GeekArthur commented 4 years ago

@sujeilyfonseca Okay, when your cluster is back and you verify this issue, if you still see the failed PV you can wait for some time the PV will be gone automatically. Reason is here: https://github.com/eclipse/codewind/issues/1477#issuecomment-565624280

sujeilyfonseca commented 4 years ago

Before workspace deletion:

[root@sfonseca-okd-codewind ~]# oc get pod --selector=app=codewind-pfe -o jsonpath='{.items[*].status.containerStatuses[*].imageID}'
docker-pullable://docker.io/eclipse/codewind-pfe-amd64@sha256:b834c33e11f85ad1e8e2203a11ce5653e831f3a7401c695d2de25c2604f761d9
[root@sfonseca-okd-codewind ~]# kubectl get pvc
NAME                                            STATUS    VOLUME           CAPACITY   ACCESS MODES   STORAGECLASS   AGE
claim-che-workspace-workspacesr88ly1bxvpjtzrc   Bound     che-workspace8   1Gi        RWO                           4m
codewind-workspacesr88ly1bxvpjtzrc              Bound     che-workspace9   1Gi        RWX                           4m
cw-cw-odo-nodejs-app-s2idata                    Bound     che-data         1Gi        RWO                           2m
cw-cw-odo-perl-app-s2idata                      Bound     che-workspace3   1Gi        RWO                           2m
cw-cw-odo-python-app-s2idata                    Bound     che-workspace5   1Gi        RWO                           2m
postgres-data                                   Bound     che-workspace1   1Gi        RWO                           2d
[root@sfonseca-okd-codewind ~]# kubectl describe pvc cw-cw-odo-nodejs-app-s2idata
Name:          cw-cw-odo-nodejs-app-s2idata
Namespace:     che
StorageClass:  
Status:        Bound
Volume:        che-data
Labels:        app=app
               app.kubernetes.io/instance=cw-cw-odo-nodejs
               app.kubernetes.io/managed-by=odo
               app.kubernetes.io/managed-by-version=v1.0.2
               app.kubernetes.io/name=nodejs
               app.kubernetes.io/part-of=app
               app.openshift.io/runtime-version=latest
Annotations:   pv.kubernetes.io/bind-completed=yes
               pv.kubernetes.io/bound-by-controller=yes
Finalizers:    [kubernetes.io/pvc-protection]
Capacity:      1Gi
Access Modes:  RWO
Events:        <none>

[root@sfonseca-okd-codewind ~]# kubectl describe pvc cw-cw-odo-perl-app-s2idata  
Name:          cw-cw-odo-perl-app-s2idata
Namespace:     che
StorageClass:  
Status:        Bound
Volume:        che-workspace3
Labels:        app=app
               app.kubernetes.io/instance=cw-cw-odo-perl
               app.kubernetes.io/managed-by=odo
               app.kubernetes.io/managed-by-version=v1.0.2
               app.kubernetes.io/name=perl
               app.kubernetes.io/part-of=app
               app.openshift.io/runtime-version=latest
Annotations:   pv.kubernetes.io/bind-completed=yes
               pv.kubernetes.io/bound-by-controller=yes
Finalizers:    [kubernetes.io/pvc-protection]
Capacity:      1Gi
Access Modes:  RWO
Events:        <none>
You have mail in /var/spool/mail/root

[root@sfonseca-okd-codewind ~]# kubectl describe pvc cw-cw-odo-python-app-s2idata
Name:          cw-cw-odo-python-app-s2idata
Namespace:     che
StorageClass:  
Status:        Bound
Volume:        che-workspace5
Labels:        app=app
               app.kubernetes.io/instance=cw-cw-odo-python
               app.kubernetes.io/managed-by=odo
               app.kubernetes.io/managed-by-version=v1.0.2
               app.kubernetes.io/name=python
               app.kubernetes.io/part-of=app
               app.openshift.io/runtime-version=latest
Annotations:   pv.kubernetes.io/bind-completed=yes
               pv.kubernetes.io/bound-by-controller=yes
Finalizers:    [kubernetes.io/pvc-protection]
Capacity:      1Gi
Access Modes:  RWO
Events:        <none>
[root@sfonseca-okd-codewind ~]# kubectl get pv 
NAME              CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS      CLAIM                                               STORAGECLASS   REASON    AGE
che-data          1Gi        RWO            Recycle          Bound       che/cw-cw-odo-nodejs-app-s2idata                                             136d
che-workspace     1Gi        RWO            Recycle          Available                                                                                136d
che-workspace1    1Gi        RWO            Recycle          Bound       che/postgres-data                                                            136d
che-workspace10   1Gi        RWX            Recycle          Bound       codewind/codewind-pfe-pvc-k44jl5a3                                           48d
che-workspace2    1Gi        RWO            Recycle          Available                                                                                136d
che-workspace3    1Gi        RWO            Recycle          Bound       che/cw-cw-odo-perl-app-s2idata                                               136d
che-workspace4    1Gi        RWO            Recycle          Available                                                                                136d
che-workspace5    1Gi        RWO            Recycle          Bound       che/cw-cw-odo-python-app-s2idata                                             136d
che-workspace6    1Gi        RWO            Recycle          Bound       codewind/codewind-keycloak-pvc-k44jl5a3                                      48d
che-workspace7    1Gi        RWO            Recycle          Available                                                                                48d
che-workspace8    1Gi        RWO            Recycle          Bound       che/claim-che-workspace-workspacesr88ly1bxvpjtzrc                            48d
che-workspace9    1Gi        RWX            Recycle          Bound       che/codewind-workspacesr88ly1bxvpjtzrc                                       48d
[root@sfonseca-okd-codewind ~]# kubectl describe pv che-data
Name:            che-data
Labels:          <none>
Annotations:     pv.kubernetes.io/bound-by-controller=yes
Finalizers:      [kubernetes.io/pv-protection]
StorageClass:    
Status:          Bound
Claim:           che/cw-cw-odo-nodejs-app-s2idata
Reclaim Policy:  Recycle
Access Modes:    RWO
Capacity:        1Gi
Node Affinity:   <none>
Message:         
Source:
    Type:      NFS (an NFS mount that lasts the lifetime of a pod)
    Server:    9.37.222.128
    Path:      /nfs/che/data
    ReadOnly:  false
Events:        <none>

[root@sfonseca-okd-codewind ~]# kubectl describe pv che-workspace3              
Name:            che-workspace3
Labels:          <none>
Annotations:     pv.kubernetes.io/bound-by-controller=yes
Finalizers:      [kubernetes.io/pv-protection]
StorageClass:    
Status:          Bound
Claim:           che/cw-cw-odo-perl-app-s2idata
Reclaim Policy:  Recycle
Access Modes:    RWO
Capacity:        1Gi
Node Affinity:   <none>
Message:         
Source:
    Type:      NFS (an NFS mount that lasts the lifetime of a pod)
    Server:    9.37.222.128
    Path:      /nfs/che/workspace3
    ReadOnly:  false
Events:        <none>

[root@sfonseca-okd-codewind ~]# kubectl describe pv che-workspace5
Name:            che-workspace5
Labels:          <none>
Annotations:     pv.kubernetes.io/bound-by-controller=yes
Finalizers:      [kubernetes.io/pv-protection]
StorageClass:    
Status:          Bound
Claim:           che/cw-cw-odo-python-app-s2idata
Reclaim Policy:  Recycle
Access Modes:    RWO
Capacity:        1Gi
Node Affinity:   <none>
Message:         
Source:
    Type:      NFS (an NFS mount that lasts the lifetime of a pod)
    Server:    9.37.222.128
    Path:      /nfs/che/workspace5
    ReadOnly:  false
Events:        <none>


After workspace deletion:

[root@sfonseca-okd-codewind ~]# oc get pv
NAME              CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS      CLAIM                                     STORAGECLASS   REASON    AGE
che-data          1Gi        RWO            Recycle          Failed      che/cw-cw-odo-nodejs-app-s2idata                                   136d
che-workspace     1Gi        RWO            Recycle          Available                                                                      136d
che-workspace1    1Gi        RWO            Recycle          Bound       che/postgres-data                                                  136d
che-workspace10   1Gi        RWX            Recycle          Bound       codewind/codewind-pfe-pvc-k44jl5a3                                 48d
che-workspace2    1Gi        RWO            Recycle          Available                                                                      136d
che-workspace3    1Gi        RWO            Recycle          Released    che/cw-cw-odo-perl-app-s2idata                                     136d
che-workspace4    1Gi        RWO            Recycle          Available                                                                      136d
che-workspace5    1Gi        RWO            Recycle          Released    che/cw-cw-odo-python-app-s2idata                                   136d
che-workspace6    1Gi        RWO            Recycle          Bound       codewind/codewind-keycloak-pvc-k44jl5a3                            48d
che-workspace7    1Gi        RWO            Recycle          Available                                                                      48d
che-workspace8    1Gi        RWO            Recycle          Available                                                                      48d
che-workspace9    1Gi        RWX            Recycle          Bound       che/codewind-workspacesr88ly1bxvpjtzrc                             48d
pv1               6Gi        RWO            Delete           Failed      eclipse-che/che-data-volume                                        137d
pv2               6Gi        RWO            Delete           Failed      eclipse-che/claim-che-workspaceq56c6zye                            137d
pv3               6Gi        RWO            Delete           Failed      eclipse-che/claim-che-workspace8ovr8k77                            137d
pv4               6Gi        RWO            Delete           Failed      eclipse-che/claim-che-workspacei0o33x09                            137d
pv5               6Gi        RWO            Delete           Failed      eclipse-che/claim-che-workspace8fbc32tx                            137d
pv6               6Gi        RWO            Delete           Failed      eclipse-che/che-data-volume                                        137d
[root@sfonseca-okd-codewind ~]# oc get pv
NAME              CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS      CLAIM                                     STORAGECLASS   REASON    AGE
che-data          1Gi        RWO            Recycle          Failed      che/cw-cw-odo-nodejs-app-s2idata                                   136d
che-workspace     1Gi        RWO            Recycle          Available                                                                      136d
che-workspace1    1Gi        RWO            Recycle          Bound       che/postgres-data                                                  136d
che-workspace10   1Gi        RWX            Recycle          Bound       codewind/codewind-pfe-pvc-k44jl5a3                                 48d
che-workspace2    1Gi        RWO            Recycle          Available                                                                      136d
che-workspace3    1Gi        RWO            Recycle          Failed      che/cw-cw-odo-perl-app-s2idata                                     136d
che-workspace4    1Gi        RWO            Recycle          Available                                                                      136d
che-workspace5    1Gi        RWO            Recycle          Failed      che/cw-cw-odo-python-app-s2idata                                   136d
che-workspace6    1Gi        RWO            Recycle          Bound       codewind/codewind-keycloak-pvc-k44jl5a3                            48d
che-workspace7    1Gi        RWO            Recycle          Available                                                                      48d
che-workspace8    1Gi        RWO            Recycle          Available                                                                      48d
che-workspace9    1Gi        RWX            Recycle          Available                                                                      48d
pv1               6Gi        RWO            Delete           Failed      eclipse-che/che-data-volume                                        137d
pv2               6Gi        RWO            Delete           Failed      eclipse-che/claim-che-workspaceq56c6zye                            137d
pv3               6Gi        RWO            Delete           Failed      eclipse-che/claim-che-workspace8ovr8k77                            137d
pv4               6Gi        RWO            Delete           Failed      eclipse-che/claim-che-workspacei0o33x09                            137d
pv5               6Gi        RWO            Delete           Failed      eclipse-che/claim-che-workspace8fbc32tx                            137d
pv6               6Gi        RWO            Delete           Failed      eclipse-che/che-data-volume                                        137d
[root@sfonseca-okd-codewind ~]# oc get pvc
NAME            STATUS    VOLUME           CAPACITY   ACCESS MODES   STORAGECLASS   AGE
postgres-data   Bound     che-workspace1   1Gi        RWO                           2d


After workspace deletion (30 minutes):

[root@sfonseca-okd-codewind ~]# oc get pv
NAME              CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS      CLAIM                                     STORAGECLASS   REASON    AGE
che-data          1Gi        RWO            Recycle          Available                                                                      136d
che-workspace     1Gi        RWO            Recycle          Available                                                                      136d
che-workspace1    1Gi        RWO            Recycle          Bound       che/postgres-data                                                  136d
che-workspace10   1Gi        RWX            Recycle          Bound       codewind/codewind-pfe-pvc-k44jl5a3                                 48d
che-workspace2    1Gi        RWO            Recycle          Available                                                                      136d
che-workspace3    1Gi        RWO            Recycle          Failed      che/cw-cw-odo-perl-app-s2idata                                     136d
che-workspace4    1Gi        RWO            Recycle          Available                                                                      136d
che-workspace5    1Gi        RWO            Recycle          Failed      che/cw-cw-odo-python-app-s2idata                                   136d
che-workspace6    1Gi        RWO            Recycle          Bound       codewind/codewind-keycloak-pvc-k44jl5a3                            48d
che-workspace7    1Gi        RWO            Recycle          Available                                                                      48d
che-workspace8    1Gi        RWO            Recycle          Available                                                                      48d
che-workspace9    1Gi        RWX            Recycle          Available                                                                      48d
pv1               6Gi        RWO            Delete           Failed      eclipse-che/che-data-volume                                        137d
pv2               6Gi        RWO            Delete           Failed      eclipse-che/claim-che-workspaceq56c6zye                            137d
pv3               6Gi        RWO            Delete           Failed      eclipse-che/claim-che-workspace8ovr8k77                            137d
pv4               6Gi        RWO            Delete           Failed      eclipse-che/claim-che-workspacei0o33x09                            137d
pv5               6Gi        RWO            Delete           Failed      eclipse-che/claim-che-workspace8fbc32tx                            137d
pv6               6Gi        RWO            Delete           Failed      eclipse-che/che-data-volume                                        137d
jagraj commented 4 years ago

@GeekArthur Looks like 30 mins is lot that ODO is not releasing volume resources and in most of the clouds, storage is a service and customers are charged based on their usage and 30 mins is lot for no reason that they have to pay even though they are not using it any more. We should open issue against ODO and add reference here, once they resolve this issue and we can close this issue after verification. Jingfu let me know what do you think.?. Let us know if you want us to open issue against ODO project.

jagraj commented 4 years ago

With above fixes, this is no longer stopship issue for 0.7.0 release. I am wondering we need to doc this in troubleshooting page and let users know eventually volume gets released and they need to have additional volumes available if they want to create more ODO projects until volume is available.

GeekArthur commented 4 years ago

@sujeilyfonseca can you describe the two failed PV to see what it shows and why it still failed (after 30 mins)?

GeekArthur commented 4 years ago

@jagraj Agree on that, 30 mins is definitely too long for user, my manually try is only ~ 1 min, weird thing is nodejs PV is available only perl and python PVs are failed, but from Codewind perspective they all use the same odo code path for releasing the PVs. And yes, it's not stopship since we can document this to let user know if the PV is not released intermittently, user can either create more available PVs or wait for the PV is available.

GeekArthur commented 4 years ago

Given odo project deletion issue has been fixed, currently only issue is it takes a long time for some environments to release the odo project PV intermittently, so remove the stopship tag.

GeekArthur commented 4 years ago

We need to investigate why it takes a long time for some environments to release the odo project PV intermittently and document it if it's needed.

sujeilyfonseca commented 4 years ago

@GeekArthur

[root@sfonseca-okd-codewind ~]# oc describe pv che-workspace3
Name:            che-workspace3
Labels:          <none>
Annotations:     pv.kubernetes.io/bound-by-controller=yes
Finalizers:      [kubernetes.io/pv-protection]
StorageClass:    
Status:          Failed
Claim:           che/cw-cw-odo-perl-app-s2idata
Reclaim Policy:  Recycle
Access Modes:    RWO
Capacity:        1Gi
Node Affinity:   <none>
Message:         Recycle failed: failed to recycle volume: pod failed, pod.Status.Message unknown.
Source:
    Type:      NFS (an NFS mount that lasts the lifetime of a pod)
    Server:    9.37.222.128
    Path:      /nfs/che/workspace3
    ReadOnly:  false
Events:
  Type     Reason               Age               From                         Message
  ----     ------               ----              ----                         -------
  Warning  VolumeFailedRecycle  55m                persistentvolume-controller  Recycle failed: failed to recycle volume: pod failed, pod.Status.Message unknown.
  Normal   RecyclerPod          54m (x2 over 55m)  persistentvolume-controller  Recycler pod: Successfully assigned openshift-infra/recycler-for-che-workspace3 to sfonseca-okd-worker1
  Normal   RecyclerPod          54m (x2 over 55m)  persistentvolume-controller  Recycler pod: Container image "docker.io/openshift/origin-recycler:v3.11" already present on machine
  Normal   RecyclerPod          54m (x2 over 55m)  persistentvolume-controller  Recycler pod: Created container
  Normal   RecyclerPod          54m (x2 over 55m)  persistentvolume-controller  Recycler pod: Started container
[root@sfonseca-okd-codewind ~]# oc describe pv che-workspace5
Name:            che-workspace5
Labels:          <none>
Annotations:     pv.kubernetes.io/bound-by-controller=yes
Finalizers:      [kubernetes.io/pv-protection]
StorageClass:    
Status:          Failed
Claim:           che/cw-cw-odo-python-app-s2idata  
Reclaim Policy:  Recycle
Access Modes:    RWO
Capacity:        1Gi
Node Affinity:   <none>
Message:         Recycle failed: failed to recycle volume: pod failed, pod.Status.Message unknown.
Source:
    Type:      NFS (an NFS mount that lasts the lifetime of a pod)
    Server:    9.37.222.128
    Path:      /nfs/che/workspace5
    ReadOnly:  false
Events:
  Type     Reason               Age                From                         Message
  ----     ------               ----               ----                         -------
  Warning  VolumeFailedRecycle  54m                persistentvolume-controller  Recycle failed: failed to recycle volume: pod failed, pod.Status.Message unknown.
  Normal   RecyclerPod          55m (x2 over 54m)  persistentvolume-controller  Recycler pod: Successfully assigned openshift-infra/recycler-for-che-workspace5 to sfonseca-okd-worker1
  Normal   RecyclerPod          55m (x2 over 54m)  persistentvolume-controller  Recycler pod: Container image "docker.io/openshift/origin-recycler:v3.11" already present on machine
  Normal   RecyclerPod          55m (x2 over 54m)  persistentvolume-controller  Recycler pod: Created container
  Normal   RecyclerPod          55m (x2 over 54m)  persistentvolume-controller  Recycler pod: Started container
  Normal   RecyclerPod          55m (x2 over 54m)  persistentvolume-controller  Recycler pod: Killing container with id docker://recyler-container:Need to kill Pod
GeekArthur commented 4 years ago

@sujeilyfonseca From your log, seems like the issue in your cluster is the recycler pod issue from OpenShift, if the recycler pod has some issue the PV won't be available since recycle volume work is done by recycler pod. From Red Hat Portal here https://access.redhat.com/solutions/2143161 looks like it's an known issue in OpenShift and temporary workaround is delete the recycler pod then delete and re-create the failed PV. I can try to reproduce this to see why the recycler pod has the issue intermittently (may from OpenShift or odo or Codewind).

malincoln commented 4 years ago

@GeekArthur @sujeilyfonseca is this resolved or should this bug move back to in progress if it's still being worked?

GeekArthur commented 4 years ago

Currently I don't see this issue, we can reopen this if we hit this issue again.

sujeilyfonseca commented 4 years ago

/reopen /priority stopship

Codewind version: 0.8.0 OS: Red Hat Enterprise Linux 7 (64 bit)

Che version: 7.5.1 Kubernetes cluster: OKD/Openshift

I'm reopening this issue and marking it as "stopship" since I have seen it several times. In each of the following scenarios, I cleaned up my cluster, uninstalled Codewind for Eclipse Che, and deployed a new workspace. After that, I created ODO projects, waited for the "Running" status, and deleted my workspace. This problem doesn't occur with other project types.
Screen Shot 2020-01-16 at 11 16 57 AM
First scenario:

[root@sfonseca-okd-codewind ~]# oc get pv
NAME              CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS      CLAIM                                     STORAGECLASS   REASON    AGE
che-data          1Gi        RWO            Recycle          Bound       che/postgres-data                                                  167d
che-workspace     1Gi        RWO            Recycle          Bound       che/cw-cw-odo-node-app-s2idata                                     167d
che-workspace1    1Gi        RWO            Recycle          Available                                                                      167d
che-workspace10   1Gi        RWX            Recycle          Bound       codewind/codewind-pfe-pvc-k44jl5a3                                 79d
che-workspace2    1Gi        RWO            Recycle          Available                                                                      167d
che-workspace3    1Gi        RWO            Recycle          Failed      che/cw-cw-odo-node-1-app-s2idata                                   167d
che-workspace4    1Gi        RWO            Recycle          Available                                                                      167d
che-workspace5    1Gi        RWO            Recycle          Available                                                                      167d
che-workspace6    1Gi        RWX            Recycle          Bound       codewind/codewind-keycloak-pvc-k44jl5a3                            79d
che-workspace7    1Gi        RWX            Recycle          Available                                                                      79d
che-workspace8    1Gi        RWX            Recycle          Available                                                                      79d
che-workspace9    1Gi        RWX            Recycle          Available                                                                      79d
[root@sfonseca-okd-codewind ~]# oc describe pv che-workspace
Name:            che-workspace
Labels:          <none>
Annotations:     pv.kubernetes.io/bound-by-controller=yes
Finalizers:      [kubernetes.io/pv-protection]
StorageClass:    
Status:          Bound
Claim:           che/cw-cw-odo-node-app-s2idata
Reclaim Policy:  Recycle
Access Modes:    RWO
Capacity:        1Gi
Node Affinity:   <none>
Message:         
Source:
    Type:      NFS (an NFS mount that lasts the lifetime of a pod)
    Server:    9.37.222.128
    Path:      /nfs/che/workspace
    ReadOnly:  false
Events:        <none>
[root@sfonseca-okd-codewind ~]# oc describe pv che-workspace3
Name:            che-workspace3
Labels:          <none>
Annotations:     pv.kubernetes.io/bound-by-controller=yes
Finalizers:      [kubernetes.io/pv-protection]
StorageClass:    
Status:          Failed
Claim:           che/cw-cw-odo-node-1-app-s2idata
Reclaim Policy:  Recycle
Access Modes:    RWO
Capacity:        1Gi
Node Affinity:   <none>
Message:         Recycle failed: old recycler pod found, will retry later
Source:
    Type:      NFS (an NFS mount that lasts the lifetime of a pod)
    Server:    9.37.222.128
    Path:      /nfs/che/workspace3
    ReadOnly:  false
Events:
  Type     Reason               Age   From                         Message
  ----     ------               ----  ----                         -------
  Warning  VolumeFailedRecycle  6m    persistentvolume-controller  Recycle failed: old recycler pod found, will retry later
  Normal   RecyclerPod          6m    persistentvolume-controller  Recycler pod: Successfully assigned openshift-infra/recycler-for-che-workspace3 to sfonseca-okd-codewind
  Warning  RecyclerPod          6m    persistentvolume-controller  Recycler pod: MountVolume.SetUp failed for volume "vol" : mount failed: exit status 32
Mounting command: systemd-run
Mounting arguments: --description=Kubernetes transient mount for /var/lib/origin/openshift.local.volumes/pods/f3d64179-387a-11ea-8236-005056a000e8/volumes/kubernetes.io~nfs/vol --scope -- mount -t nfs 9.37.222.128:/nfs/che/workspace3 /var/lib/origin/openshift.local.volumes/pods/f3d64179-387a-11ea-8236-005056a000e8/volumes/kubernetes.io~nfs/vol
Output: Running scope as unit run-83314.scope.
mount.nfs: access denied by server while mounting 9.37.222.128:/nfs/che/workspace3
  Warning  RecyclerPod  6m  persistentvolume-controller  Recycler pod: MountVolume.SetUp failed for volume "vol" : mount failed: exit status 32
Mounting command: systemd-run
Mounting arguments: --description=Kubernetes transient mount for /var/lib/origin/openshift.local.volumes/pods/f3d64179-387a-11ea-8236-005056a000e8/volumes/kubernetes.io~nfs/vol --scope -- mount -t nfs 9.37.222.128:/nfs/che/workspace3 /var/lib/origin/openshift.local.volumes/pods/f3d64179-387a-11ea-8236-005056a000e8/volumes/kubernetes.io~nfs/vol
Output: Running scope as unit run-83387.scope.
mount.nfs: access denied by server while mounting 9.37.222.128:/nfs/che/workspace3
  Warning  RecyclerPod  6m  persistentvolume-controller  Recycler pod: MountVolume.SetUp failed for volume "vol" : mount failed: exit status 32
Mounting command: systemd-run
Mounting arguments: --description=Kubernetes transient mount for /var/lib/origin/openshift.local.volumes/pods/f3d64179-387a-11ea-8236-005056a000e8/volumes/kubernetes.io~nfs/vol --scope -- mount -t nfs 9.37.222.128:/nfs/che/workspace3 /var/lib/origin/openshift.local.volumes/pods/f3d64179-387a-11ea-8236-005056a000e8/volumes/kubernetes.io~nfs/vol
Output: Running scope as unit run-83444.scope.
mount.nfs: access denied by server while mounting 9.37.222.128:/nfs/che/workspace3
  Warning  RecyclerPod  6m  persistentvolume-controller  Recycler pod: MountVolume.SetUp failed for volume "vol" : mount failed: exit status 32
Mounting command: systemd-run
Mounting arguments: --description=Kubernetes transient mount for /var/lib/origin/openshift.local.volumes/pods/f3d64179-387a-11ea-8236-005056a000e8/volumes/kubernetes.io~nfs/vol --scope -- mount -t nfs 9.37.222.128:/nfs/che/workspace3 /var/lib/origin/openshift.local.volumes/pods/f3d64179-387a-11ea-8236-005056a000e8/volumes/kubernetes.io~nfs/vol
Output: Running scope as unit run-83542.scope.
mount.nfs: access denied by server while mounting 9.37.222.128:/nfs/che/workspace3
  Warning  RecyclerPod  6m  persistentvolume-controller  Recycler pod: MountVolume.SetUp failed for volume "vol" : mount failed: exit status 32
Mounting command: systemd-run
Mounting arguments: --description=Kubernetes transient mount for /var/lib/origin/openshift.local.volumes/pods/f3d64179-387a-11ea-8236-005056a000e8/volumes/kubernetes.io~nfs/vol --scope -- mount -t nfs 9.37.222.128:/nfs/che/workspace3 /var/lib/origin/openshift.local.volumes/pods/f3d64179-387a-11ea-8236-005056a000e8/volumes/kubernetes.io~nfs/vol
Output: Running scope as unit run-83668.scope.
mount.nfs: access denied by server while mounting 9.37.222.128:/nfs/che/workspace3
  Warning  RecyclerPod  6m  persistentvolume-controller  Recycler pod: MountVolume.SetUp failed for volume "vol" : mount failed: exit status 32
Mounting command: systemd-run
Mounting arguments: --description=Kubernetes transient mount for /var/lib/origin/openshift.local.volumes/pods/f3d64179-387a-11ea-8236-005056a000e8/volumes/kubernetes.io~nfs/vol --scope -- mount -t nfs 9.37.222.128:/nfs/che/workspace3 /var/lib/origin/openshift.local.volumes/pods/f3d64179-387a-11ea-8236-005056a000e8/volumes/kubernetes.io~nfs/vol
Output: Running scope as unit run-83874.scope.
mount.nfs: access denied by server while mounting 9.37.222.128:/nfs/che/workspace3
  Warning  RecyclerPod  6m  persistentvolume-controller  Recycler pod: MountVolume.SetUp failed for volume "vol" : mount failed: exit status 32
Mounting command: systemd-run
Mounting arguments: --description=Kubernetes transient mount for /var/lib/origin/openshift.local.volumes/pods/f3d64179-387a-11ea-8236-005056a000e8/volumes/kubernetes.io~nfs/vol --scope -- mount -t nfs 9.37.222.128:/nfs/che/workspace3 /var/lib/origin/openshift.local.volumes/pods/f3d64179-387a-11ea-8236-005056a000e8/volumes/kubernetes.io~nfs/vol
Output: Running scope as unit run-84288.scope.
mount.nfs: access denied by server while mounting 9.37.222.128:/nfs/che/workspace3
  Warning  RecyclerPod  5m  persistentvolume-controller  Recycler pod: MountVolume.SetUp failed for volume "vol" : mount failed: exit status 32
Mounting command: systemd-run
Mounting arguments: --description=Kubernetes transient mount for /var/lib/origin/openshift.local.volumes/pods/f3d64179-387a-11ea-8236-005056a000e8/volumes/kubernetes.io~nfs/vol --scope -- mount -t nfs 9.37.222.128:/nfs/che/workspace3 /var/lib/origin/openshift.local.volumes/pods/f3d64179-387a-11ea-8236-005056a000e8/volumes/kubernetes.io~nfs/vol
Output: Running scope as unit run-85159.scope.
mount.nfs: access denied by server while mounting 9.37.222.128:/nfs/che/workspace3
  Warning  RecyclerPod  4m  persistentvolume-controller  Recycler pod: Unable to mount volumes for pod "recycler-for-che-workspace3_openshift-infra(f3d64179-387a-11ea-8236-005056a000e8)": timeout expired waiting for volumes to attach or mount for pod "openshift-infra"/"recycler-for-che-workspace3". list of unmounted volumes=[vol]. list of unattached volumes=[vol pv-recycler-controller-token-w749x]
  Warning  RecyclerPod  4m  persistentvolume-controller  (combined from similar events): Recycler pod: (combined from similar events): MountVolume.SetUp failed for volume "vol" : mount failed: exit status 32
Mounting command: systemd-run
Mounting arguments: --description=Kubernetes transient mount for /var/lib/origin/openshift.local.volumes/pods/f3d64179-387a-11ea-8236-005056a000e8/volumes/kubernetes.io~nfs/vol --scope -- mount -t nfs 9.37.222.128:/nfs/che/workspace3 /var/lib/origin/openshift.local.volumes/pods/f3d64179-387a-11ea-8236-005056a000e8/volumes/kubernetes.io~nfs/vol
Output: Running scope as unit run-86883.scope.
mount.nfs: access denied by server while mounting 9.37.222.128:/nfs/che/workspace3
cw-cw-odo-node-app-1-dknb8                                        1/1       Running   0          4h
che-workspace     1Gi        RWO            Recycle          Bound       che/cw-cw-odo-node-app-s2idata                                               167d


Second scenario: Before workspace deletion:

[root@sfonseca-okd-codewind ~]# oc get pods
NAME                                                              READY     STATUS    RESTARTS   AGE
che-8d97c6d4d-xpz5n                                               1/1       Running   0          5h
che-operator-6945b8c446-lt5ks                                     1/1       Running   0          5h
codewind-performance-workspacel3sjxgfbqx5n3tp3-8496d4545d-8kbs7   1/1       Running   0          7m
codewind-workspacel3sjxgfbqx5n3tp3-b6d685b58-wrcl8                1/1       Running   0          7m
cw-cw-odo-node-0-8-0-app-1-c2cvv                                  1/1       Running   0          5m
cw-cw-odo-node-app-1-dknb8                                        1/1       Running   0          5h
cw-cw-odo-perl-0-8-0-app-1-mddvf                                  1/1       Running   0          4m
cw-cw-odo-python-0-8-0-app-1-nwk7d                                1/1       Running   0          5m
devfile-registry-7686996bb4-5n7mj                                 1/1       Running   0          5h
keycloak-5f98cc795f-fc5rl                                         1/1       Running   0          5h
plugin-registry-786bf868d-h5djt                                   1/1       Running   0          5h
postgres-675cdf8c86-zgmxz                                         1/1       Running   0          5h
workspacel3sjxgfbqx5n3tp3.che-jwtproxy-5d9ff78d79-4gqfm           1/1       Running   0          8m
workspacel3sjxgfbqx5n3tp3.che-workspace-pod-68896cbc46-x96q7      5/5       Running   0          8m
[root@sfonseca-okd-codewind ~]# oc get pvc
NAME                                            STATUS    VOLUME           CAPACITY   ACCESS MODES   STORAGECLASS   AGE
claim-che-workspace-workspacel3sjxgfbqx5n3tp3   Bound     che-workspace1   1Gi        RWO                           2h
codewind-workspacel3sjxgfbqx5n3tp3              Bound     che-workspace7   1Gi        RWX                           2h
cw-cw-odo-node-0-8-0-app-s2idata                Bound     che-workspace5   1Gi        RWO                           6m
cw-cw-odo-node-app-s2idata                      Bound     che-workspace    1Gi        RWO                           5h
cw-cw-odo-perl-0-8-0-app-s2idata                Bound     che-workspace4   1Gi        RWO                           5m
cw-cw-odo-python-0-8-0-app-s2idata              Bound     che-workspace3   1Gi        RWO                           6m
postgres-data                                   Bound     che-data         1Gi        RWO                           5h
[root@sfonseca-okd-codewind ~]# oc describe pvc cw-cw-odo-node-0-8-0-app-s2idata
Name:          cw-cw-odo-node-0-8-0-app-s2idata
Namespace:     che
StorageClass:  
Status:        Bound
Volume:        che-workspace5
Labels:        app=app
               app.kubernetes.io/instance=cw-cw-odo-node-0-8-0
               app.kubernetes.io/managed-by=odo
               app.kubernetes.io/managed-by-version=v1.0.3
               app.kubernetes.io/name=nodejs
               app.kubernetes.io/part-of=app
               app.openshift.io/runtime-version=latest
Annotations:   pv.kubernetes.io/bind-completed=yes
               pv.kubernetes.io/bound-by-controller=yes
Finalizers:    [kubernetes.io/pvc-protection]
Capacity:      1Gi
Access Modes:  RWO
Events:        <none>

[root@sfonseca-okd-codewind ~]# oc describe pvc cw-cw-odo-perl-0-8-0-app-s2idata 
Name:          cw-cw-odo-perl-0-8-0-app-s2idata
Namespace:     che
StorageClass:  
Status:        Bound
Volume:        che-workspace4
Labels:        app=app
               app.kubernetes.io/instance=cw-cw-odo-perl-0-8-0
               app.kubernetes.io/managed-by=odo
               app.kubernetes.io/managed-by-version=v1.0.3
               app.kubernetes.io/name=perl
               app.kubernetes.io/part-of=app
               app.openshift.io/runtime-version=latest
Annotations:   pv.kubernetes.io/bind-completed=yes
               pv.kubernetes.io/bound-by-controller=yes
Finalizers:    [kubernetes.io/pvc-protection]
Capacity:      1Gi
Access Modes:  RWO
Events:        <none>

[root@sfonseca-okd-codewind ~]# oc describe pvc cw-cw-odo-python-0-8-0-app-s2idata
Name:          cw-cw-odo-python-0-8-0-app-s2idata
Namespace:     che
StorageClass:  
Status:        Bound
Volume:        che-workspace3
Labels:        app=app
               app.kubernetes.io/instance=cw-cw-odo-python-0-8-0
               app.kubernetes.io/managed-by=odo
               app.kubernetes.io/managed-by-version=v1.0.3
               app.kubernetes.io/name=python
               app.kubernetes.io/part-of=app
               app.openshift.io/runtime-version=latest
Annotations:   pv.kubernetes.io/bind-completed=yes
               pv.kubernetes.io/bound-by-controller=yes
Finalizers:    [kubernetes.io/pvc-protection]
Capacity:      1Gi
Access Modes:  RWO
Events:        <none>
[root@sfonseca-okd-codewind ~]# oc get pv
NAME              CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS      CLAIM                                               STORAGECLASS   REASON    AGE
che-data          1Gi        RWO            Recycle          Bound       che/postgres-data                                                            167d
che-workspace     1Gi        RWO            Recycle          Bound       che/cw-cw-odo-node-app-s2idata                                               168d
che-workspace1    1Gi        RWO            Recycle          Bound       che/claim-che-workspace-workspacel3sjxgfbqx5n3tp3                            168d
che-workspace10   1Gi        RWX            Recycle          Bound       codewind/codewind-pfe-pvc-k44jl5a3                                           79d
che-workspace2    1Gi        RWO            Recycle          Available                                                                                168d
che-workspace3    1Gi        RWO            Recycle          Bound       che/cw-cw-odo-python-0-8-0-app-s2idata                                       168d
che-workspace4    1Gi        RWO            Recycle          Bound       che/cw-cw-odo-perl-0-8-0-app-s2idata                                         168d
che-workspace5    1Gi        RWO            Recycle          Bound       che/cw-cw-odo-node-0-8-0-app-s2idata                                         168d
che-workspace6    1Gi        RWX            Recycle          Bound       codewind/codewind-keycloak-pvc-k44jl5a3                                      79d
che-workspace7    1Gi        RWX            Recycle          Bound       che/codewind-workspacel3sjxgfbqx5n3tp3                                       79d
che-workspace8    1Gi        RWX            Recycle          Available                                                                                79d
che-workspace9    1Gi        RWX            Recycle          Available                                                                                79d
[root@sfonseca-okd-codewind ~]# oc describe pv che-workspace5
Name:            che-workspace5
Labels:          <none>
Annotations:     pv.kubernetes.io/bound-by-controller=yes
Finalizers:      [kubernetes.io/pv-protection]
StorageClass:    
Status:          Bound
Claim:           che/cw-cw-odo-node-0-8-0-app-s2idata
Reclaim Policy:  Recycle
Access Modes:    RWO
Capacity:        1Gi
Node Affinity:   <none>
Message:         
Source:
    Type:      NFS (an NFS mount that lasts the lifetime of a pod)
    Server:    9.37.222.128
    Path:      /nfs/che/workspace5
    ReadOnly:  false
Events:        <none>

[root@sfonseca-okd-codewind ~]# oc describe pv che-workspace4
Name:            che-workspace4
Labels:          <none>
Annotations:     pv.kubernetes.io/bound-by-controller=yes
Finalizers:      [kubernetes.io/pv-protection]
StorageClass:    
Status:          Bound
Claim:           che/cw-cw-odo-perl-0-8-0-app-s2idata
Reclaim Policy:  Recycle
Access Modes:    RWO
Capacity:        1Gi
Node Affinity:   <none>
Message:         
Source:
    Type:      NFS (an NFS mount that lasts the lifetime of a pod)
    Server:    9.37.222.128
    Path:      /nfs/che/workspace4
    ReadOnly:  false
Events:        <none>

[root@sfonseca-okd-codewind ~]# oc describe pv che-workspace3
Name:            che-workspace3
Labels:          <none>
Annotations:     pv.kubernetes.io/bound-by-controller=yes
Finalizers:      [kubernetes.io/pv-protection]
StorageClass:    
Status:          Bound
Claim:           che/cw-cw-odo-python-0-8-0-app-s2idata
Reclaim Policy:  Recycle
Access Modes:    RWO
Capacity:        1Gi
Node Affinity:   <none>
Message:         
Source:
    Type:      NFS (an NFS mount that lasts the lifetime of a pod)
    Server:    9.37.222.128
    Path:      /nfs/che/workspace3
    ReadOnly:  false
Events:        <none>

After workspace deletion:

NAME                                READY     STATUS    RESTARTS   AGE
che-8d97c6d4d-xpz5n                 1/1       Running   0          5h
che-operator-6945b8c446-lt5ks       1/1       Running   0          5h
cw-cw-odo-node-app-1-dknb8          1/1       Running   0          5h
devfile-registry-7686996bb4-5n7mj   1/1       Running   0          5h
keycloak-5f98cc795f-fc5rl           1/1       Running   0          5h
plugin-registry-786bf868d-h5djt     1/1       Running   0          5h
postgres-675cdf8c86-zgmxz           1/1       Running   0          5h
[root@sfonseca-okd-codewind ~]# oc get pv
NAME              CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS      CLAIM                                     STORAGECLASS   REASON    AGE
che-data          1Gi        RWO            Recycle          Bound       che/postgres-data                                                  167d
che-workspace     1Gi        RWO            Recycle          Bound       che/cw-cw-odo-node-app-s2idata                                     168d
che-workspace1    1Gi        RWO            Recycle          Available                                                                      168d
che-workspace10   1Gi        RWX            Recycle          Bound       codewind/codewind-pfe-pvc-k44jl5a3                                 79d
che-workspace2    1Gi        RWO            Recycle          Available                                                                      168d
che-workspace3    1Gi        RWO            Recycle          Available                                                                      168d
che-workspace4    1Gi        RWO            Recycle          Available                                                                      168d
che-workspace5    1Gi        RWO            Recycle          Failed      che/cw-cw-odo-node-0-8-0-app-s2idata                               168d
che-workspace6    1Gi        RWX            Recycle          Bound       codewind/codewind-keycloak-pvc-k44jl5a3                            79d
che-workspace7    1Gi        RWX            Recycle          Bound       che/codewind-workspacel3sjxgfbqx5n3tp3                             79d
che-workspace8    1Gi        RWX            Recycle          Available                                                                      79d
[root@sfonseca-okd-codewind ~]# oc describe pv che-workspace5
Name:            che-workspace5
Labels:          <none>
Annotations:     <none>
Finalizers:      [kubernetes.io/pv-protection]
StorageClass:    
Status:          Available
Claim:           
Reclaim Policy:  Recycle
Access Modes:    RWO
Capacity:        1Gi
Node Affinity:   <none>
Message:         
Source:
    Type:      NFS (an NFS mount that lasts the lifetime of a pod)
    Server:    9.37.222.128
    Path:      /nfs/che/workspace5
    ReadOnly:  false
Events:
  Type     Reason               Age                From                         Message
  ----     ------               ----               ----                         -------
  Warning  VolumeFailedRecycle  57s                persistentvolume-controller  Recycle failed: failed to recycle volume: recycler pod was deleted
  Warning  RecyclerPod          25s (x3 over 1m)   persistentvolume-controller  Recycler pod: Error: cannot find volume "vol" to mount into container "recyler-container"
  Normal   RecyclerPod          25s (x17 over 5d)  persistentvolume-controller  Recycler pod: Successfully assigned openshift-infra/recycler-for-che-workspace5 to sfonseca-okd-worker1
  Normal   RecyclerPod          25s (x5 over 5d)   persistentvolume-controller  Recycler pod: Killing container with id docker://recyler-container:Need to kill Pod
  Normal   RecyclerPod          22s (x32 over 9d)  persistentvolume-controller  Recycler pod: Started container
  Normal   RecyclerPod          22s (x35 over 9d)  persistentvolume-controller  Recycler pod: Container image "docker.io/openshift/origin-recycler:v3.11" already present on machine
  Normal   RecyclerPod          22s (x32 over 9d)  persistentvolume-controller  Recycler pod: Created container
  Normal   VolumeRecycled       16s (x14 over 9d)  persistentvolume-controller  Volume recycled


Third scenario:

[root@sfonseca-okd-codewind ~]# oc get pv
NAME              CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS      CLAIM                                     STORAGECLASS   REASON    AGE
che-data          1Gi        RWO            Recycle          Bound       che/postgres-data                                                  167d
che-workspace     1Gi        RWO            Recycle          Bound       che/cw-cw-odo-node-app-s2idata                                     168d
che-workspace1    1Gi        RWO            Recycle          Failed      che/cw-cw-odo-python-0-8-0-app-s2idata                             168d
che-workspace10   1Gi        RWX            Recycle          Bound       codewind/codewind-pfe-pvc-k44jl5a3                                 79d
che-workspace2    1Gi        RWO            Recycle          Available                                                                      168d
che-workspace3    1Gi        RWO            Recycle          Available                                                                      168d
che-workspace4    1Gi        RWO            Recycle          Failed      che/cw-cw-odo-perl-0-8-0-app-s2idata                               168d
che-workspace5    1Gi        RWO            Recycle          Available                                                                      168d
che-workspace6    1Gi        RWX            Recycle          Bound       codewind/codewind-keycloak-pvc-k44jl5a3                            79d
che-workspace7    1Gi        RWX            Recycle          Available                                                                      79d
che-workspace8    1Gi        RWX            Recycle          Available                                                                      79d
che-workspace9    1Gi        RWX            Recycle          Available                                                                      79d
pv1               6Gi        RWO            Delete           Failed      eclipse-che/che-data-volume                                        168d
pv2               6Gi        RWO            Delete           Failed      eclipse-che/claim-che-workspaceq56c6zye                            168d
pv3               6Gi        RWO            Delete           Failed      eclipse-che/claim-che-workspace8ovr8k77                            168d
pv4               6Gi        RWO            Delete           Failed      eclipse-che/claim-che-workspacei0o33x09                            168d
pv5               6Gi        RWO            Delete           Failed      eclipse-che/claim-che-workspace8fbc32tx                            168d
pv6               6Gi        RWO            Delete           Failed      eclipse-che/che-data-volume                                        168d
[root@sfonseca-okd-codewind ~]# oc get pods che-workspace1 
No resources found.
Error from server (NotFound): pods "che-workspace1" not found
[root@sfonseca-okd-codewind ~]# oc describe pv che-workspace1 
Name:            che-workspace1
Labels:          <none>
Annotations:     pv.kubernetes.io/bound-by-controller=yes
Finalizers:      [kubernetes.io/pv-protection]
StorageClass:    
Status:          Failed
Claim:           che/cw-cw-odo-python-0-8-0-app-s2idata
Reclaim Policy:  Recycle
Access Modes:    RWO
Capacity:        1Gi
Node Affinity:   <none>
Message:         Recycle failed: failed to delete recycler pod: pods "recycler-for-che-workspace2" not found
Source:
    Type:      NFS (an NFS mount that lasts the lifetime of a pod)
    Server:    9.37.222.128
    Path:      /nfs/che/workspace1
    ReadOnly:  false
Events:
  Type     Reason               Age                 From                         Message
  ----     ------               ----                ----                         -------
  Warning  VolumeFailedRecycle  11m (x3 over 5h)    persistentvolume-controller  Recycle failed: old recycler pod found, will retry later
  Normal   VolumeRecycled       11m (x10 over 14d)  persistentvolume-controller  Volume recycled
  Normal   RecyclerPod          4m (x19 over 14d)   persistentvolume-controller  Recycler pod: Started container
  Normal   RecyclerPod          4m (x7 over 5d)     persistentvolume-controller  Recycler pod: Successfully assigned openshift-infra/recycler-for-che-workspace1 to sfonseca-okd-worker1
  Normal   RecyclerPod          4m (x13 over 14d)   persistentvolume-controller  Recycler pod: Successfully assigned openshift-infra/recycler-for-che-workspace1 to sfonseca-okd-worker3
  Normal   RecyclerPod          4m (x21 over 14d)   persistentvolume-controller  Recycler pod: Container image "docker.io/openshift/origin-recycler:v3.11" already present on machine
  Normal   RecyclerPod          4m (x20 over 14d)   persistentvolume-controller  Recycler pod: Created container
[root@sfonseca-okd-codewind ~]# oc describe pv che-workspace4
Name:            che-workspace4
Labels:          <none>
Annotations:     pv.kubernetes.io/bound-by-controller=yes
Finalizers:      [kubernetes.io/pv-protection]
StorageClass:    
Status:          Failed
Claim:           che/cw-cw-odo-perl-0-8-0-app-s2idata
Reclaim Policy:  Recycle
Access Modes:    RWO
Capacity:        1Gi
Node Affinity:   <none>
Message:         Recycle failed: old recycler pod found, will retry later
Source:
    Type:      NFS (an NFS mount that lasts the lifetime of a pod)
    Server:    9.37.222.128
    Path:      /nfs/che/workspace4
    ReadOnly:  false
Events:
  Type     Reason               Age               From                         Message
  ----     ------               ----              ----                         -------
  Normal   VolumeRecycled       15m (x7 over 5d)  persistentvolume-controller  Volume recycled
  Warning  VolumeFailedRecycle  4m (x2 over 5d)   persistentvolume-controller  Recycle failed: old recycler pod found, will retry later
  Normal   RecyclerPod          3m (x4 over 5d)   persistentvolume-controller  Recycler pod: Successfully assigned openshift-infra/recycler-for-che-workspace4 to sfonseca-okd-worker1
  Normal   RecyclerPod          3m (x12 over 5d)  persistentvolume-controller  Recycler pod: Container image "docker.io/openshift/origin-recycler:v3.11" already present on machine
  Normal   RecyclerPod          3m (x12 over 5d)  persistentvolume-controller  Recycler pod: Created container
  Normal   RecyclerPod          3m (x12 over 5d)  persistentvolume-controller  Recycler pod: Started container
  Normal   RecyclerPod          3m (x8 over 5d)   persistentvolume-controller  Recycler pod: Successfully assigned openshift-infra/recycler-for-che-workspace4 to sfonseca-okd-worker3
  Warning  RecyclerPod          1m                persistentvolume-controller  Recycler pod: Unable to mount volumes for pod "recycler-for-che-workspace4_openshift-infra(033d1a59-389a-11ea-8236-005056a000e8)": timeout expired waiting for volumes to attach or mount for pod "openshift-infra"/"recycler-for-che-workspace4". list of unmounted volumes=[vol pv-recycler-controller-token-w749x]. list of unattached volumes=[vol pv-recycler-controller-token-w749x]
malincoln commented 4 years ago

@GeekArthur ^^^ this was reopened. I'll move it to new pipeline for now.

GeekArthur commented 4 years ago

We discuss this in today's iterative-dev status meeting, and we think it's not stopship issue because of the following reasons:

@jagraj Please verify this on your non-NFS storage or OCP 4.x cluster to verify if this issue is due to NFS storage issue in OCP 3.x

naveenkaratekid commented 4 years ago

When I deleted the workspace, I see that the odo project pod is running, and the PVs are bound.

NAME                                READY     STATUS    RESTARTS   AGE
che-55d54fdbbd-ljd9n                1/1       Running   0          7d
che-operator-58b5f759bd-vjsv6       1/1       Running   0          7d
cw-odo-perl-app-1-5f8x2             1/1       Running   0          21h
devfile-registry-54974f5486-xc9sk   1/1       Running   0          7d
keycloak-699597ff5-zrdsc            1/1       Running   0          7d
plugin-registry-5f8cddb958-k6f8l    1/1       Running   0          7d
postgres-677985cfcb-klb8s           1/1       Running   0          7d
[root@nk-openshift-master ~]# oc get pvc
NAME                                            STATUS    VOLUME           CAPACITY   ACCESS MODES   STORAGECLASS   AGE
claim-che-workspace-workspace7ye2khbv6g6oelg5   Bound     che-workspace5   1Gi        RWO                           2d
codewind-workspace7ye2khbv6g6oelg5              Bound     che-workspace1   1Gi        RWX                           2d
cw-odo-perl-app-s2idata                         Bound     che-workspace    1Gi        RWO                           21h
postgres-data                                   Bound     che-workspace3   1Gi        RWO                           7d
[root@nk-openshift-master ~]# oc get pv,pvc
NAME                              CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS      CLAIM                                               STORAGECLASS   REASON    AGE
persistentvolume/che-data         1Gi        RWO            Recycle          Available                                                                                162d
persistentvolume/che-workspace    1Gi        RWO            Recycle          Bound       che/cw-odo-perl-app-s2idata                                                  162d
persistentvolume/che-workspace1   1Gi        RWX            Recycle          Bound       che/codewind-workspace7ye2khbv6g6oelg5                                       162d
persistentvolume/che-workspace2   1Gi        RWO            Recycle          Available                                                                                162d
persistentvolume/che-workspace3   1Gi        RWO            Recycle          Bound       che/postgres-data                                                            162d
persistentvolume/che-workspace4   1Gi        RWO            Recycle          Available                                                                                162d
persistentvolume/che-workspace5   1Gi        RWO            Recycle          Bound       che/claim-che-workspace-workspace7ye2khbv6g6oelg5                            162d
persistentvolume/pv1              6Gi        RWO            Delete           Available                                                                                8d
persistentvolume/pv2              6Gi        RWO            Delete           Available                                                                                8d
persistentvolume/pv3              6Gi        RWO            Delete           Available                                                                                8d
persistentvolume/pv4              6Gi        RWO            Delete           Available                                                                                8d
persistentvolume/pv5              6Gi        RWO            Delete           Available                                                                                8d
persistentvolume/pv6              6Gi        RWO            Delete           Available                                                                                8d

NAME                                                                  STATUS    VOLUME           CAPACITY   ACCESS MODES   STORAGECLASS   AGE
persistentvolumeclaim/claim-che-workspace-workspace7ye2khbv6g6oelg5   Bound     che-workspace5   1Gi        RWO                           2d
persistentvolumeclaim/codewind-workspace7ye2khbv6g6oelg5              Bound     che-workspace1   1Gi        RWX                           2d
persistentvolumeclaim/cw-odo-perl-app-s2idata                         Bound     che-workspace    1Gi        RWO                           21h
persistentvolumeclaim/postgres-data                                   Bound     che-workspace3   1Gi        RWO                           7d
[root@nk-openshift-master ~]# oc get is
NAME              DOCKER REPO                                            TAGS      UPDATED
cw-odo-perl-app   docker-registry.default.svc:5000/che/cw-odo-perl-app             
malincoln commented 4 years ago

@GeekArthur just confirming this is still on target for 0.9.0 that dcut on Wednesday?

GeekArthur commented 4 years ago

I just try to reproduce this issue with latest master, it works for me. @malincoln Base on the priority and occurrence frequency described above, we can remove the 0.9.0 tag but I will keep an eye on this.

malincoln commented 4 years ago

@sujeilyfonseca should we update the title to add "intermittent" since Jinfu cannot consistently recreate?

sujeilyfonseca commented 4 years ago

Sure, @malincoln! I agree that this problem cannot be recreated consistently since it depends on the user cluster. For example, it can be reproduced more often with a cluster with NFS storage, but the scenarios may vary. I'll proceed to update the title.

cbandy commented 4 years ago

I encountered these same symptoms and followed it to what I think is a bug in PV controller: https://github.com/kubernetes/kubernetes/issues/92946