Closed gattytto closed 4 years ago
Looks like related to disaster recovery - https://github.com/eclipse/che/issues/14240 @gattytto thanks for reporting and looks like you did a pretty good analysis. Will you be interested in contributing a fix?
the PersistentVolume implemented by chectl to start the postgres should use a path beginning with /data to avoid minikube earsing its content upon a node hard-reset.
hostpath field "path:" set to empty when defining a PersistentVolume causes minikube default StorageClass implementation to use /tmp/hostpath-provisioner/ as the folder, which gets emptied upon reboots according to https://minikube.sigs.k8s.io/docs/reference/persistent_volumes/
if this gets sorted out I could go on and run test-scenarios for the workspace pods too.
@ibuziuk yes partially, I’m in testing phase but it can be done
I need some help, please. I will provide reproduction steps. First of all this is specific to minikube+chectl deployment of che.
so far I did code changes in https://github.com/gattytto/che-operator and started the deployment using:
chectl server:start -m -p minikube --che-operator-image=quay.io/gattytto/che-operator:latest -t /usr/local/lib/chectl/templates
one part of the change is to controller code adding the persistentVolume, and there's also a storageClass in https://github.com/gattytto/che-operator/blob/master/deploy/storageclass.yaml with which I had to use kubectl command to add it to the cluster, because for some reason the dashboard doesn't accept it (but CMDLine kubectl does). the storage class is hardcoded to the persistentVolumeClaim(PVC) and the persistentVolume(PV) because the PVC gets the standard one when created without specific storageclass and PV gets none. I see the argument to use a specific storage class but for the time I just hardcoded it.
chectl yaml files for role.yaml and cluster-role.yaml had the addition of the persistentvolumes resource, I have edited the ones in https://github.com/gattytto/che-operator/blob/master/deploy/role.yaml and /cluster-role.yaml respectively and copied them to: /usr/local/lib/chectl/templates/che-operator/ so chectl uses them when starting the deployment.
I have manually created /data/minikube folder and set permission to 777, the operator startup process effectively creates the subfolder "userdata", which holds the postgres db files and has the expected user rights for UID=26 and GID=26. THIS PART IS IMPORTANT, because the PersistentVolume type is DirectoryOrCreate, and since in the scenario that minikube is using the vm-driver=none tag (running inside LXC container), minikube is running as root and the directory minikube inside /data will be created with root:root rights. so That's why I pre-created it and set the rights to 777. this will be fixable from code when minikube team implements the "mountoptions" property for persistentVolumes in minikube.
Part of the process gets done and it gets stuck before deploying the plugin registry. I don't know why and I also don't know how to further debug / test why the operator is stopping the deplyment process. As seen in the screenshot, what I CAN be sure of, is that both keycloak and postgres pods are started and healthy, I have also accessed keycloak-che url and successfully logged in as admin:admin.
and it works after a hard reset of the LXC container, at least what was started, comes back.
@gattytto Could you share che-operator logs. AFAIK che-operator do some exec in keycloak, maybe it's failed.
I have finished the code modifications to persist postgres data and it works.
After a hard reset of the LXC container, postgres, keycloack and che come back.
as for Workspaces: they don't, because their storage got deleted by minikube
it seems like persistentvolumeclaim provisioning is split in half for the kubernetes use-case, che-operator provisions postgres-data volume and che-server follows config values set in volumeclaimStrategy and uses java code to make the volumes for the workspaces. Could this be moved to che-operator golang code instead?
I am still facing the same issue, Persistent volume Postgres data lost after minikube stop. Do we have a solution for this problem? please share. If this is working in an earlier minikube version. please share the working minikube version. i am facing issue in minikube version: v1.5.2
@simha369 no there's no fix but I have filed a feature request https://github.com/eclipse/che/issues/15157 .. you can patch the che-operator code to persist your postgres database and general info (like ssh keys?) from your dev env, but after a hard reset you would still need to recreate (delete and create again) the workspaces from your devfiles registry or using factories. So depending on what you need to persist there is a workaround or not (for the moment)
@gattytto Join to review, please https://github.com/eclipse/che-operator/pull/144
@gattytto Do you think we can close the issue?
I'm very happy to say yes
Describe the bug
rebooting the minikube node hosting a che env, postgres pod's /var/lib/pgsql/data is gone, postgres and keycloak pods go BackOff
Che version
Steps to reproduce
anything that causes the minikube node to reboot (be it gracefully or a hard reset)
Expected behavior
I expect the che context to be brought back up with postgres and keycloak pods loading the pre-existing database until I decide to issue chectl:delete
Runtime
kubectl version
)oc version
)minikube version
andkubectl version
)minishift version
andoc version
)docker version
andkubectl version
)Screenshots
Installation method
Environment
Additional context
the PersistentVolume implemented by chectl to start the postgres should use a path beginning with /data to avoid minikube earsing its content upon a node hard-reset.
hostpath field "path:" set to empty when defining a PersistentVolume causes minikube default StorageClass implementation to use /tmp/hostpath-provisioner/ as the folder, which gets emptied upon reboots according to https://minikube.sigs.k8s.io/docs/reference/persistent_volumes/
if this gets sorted out I could go on and run test-scenarios for the workspace pods too.
$ kubectl get pv pvc-90a86e5a-a7d8-43b5-9bae-9e1064f9df0b -o yaml
apiVersion: v1 kind: PersistentVolume metadata: annotations: hostPathProvisionerIdentity: 47e548c5-fca5-11e9-9417-02427d267bb8 pv.kubernetes.io/provisioned-by: k8s.io/minikube-hostpath creationTimestamp: "2019-11-01T15:56:33Z" finalizers:
ReadWriteOnce capacity: storage: 1Gi claimRef: apiVersion: v1 kind: PersistentVolumeClaim name: postgres-data namespace: che resourceVersion: "175266" uid: 90a86e5a-a7d8-43b5-9bae-9e1064f9df0b hostPath: path: /tmp/hostpath-provisioner/pvc-90a86e5a-a7d8-43b5-9bae-9e1064f9df0b type: "" persistentVolumeReclaimPolicy: Delete storageClassName: standard volumeMode: Filesystem status: phase: Bound