fabric8io / gofabric8

CLI used when working with fabric8 running on Kubernetes or OpenShift
https://fabric8.io/
Apache License 2.0
147 stars 72 forks source link

Fresh Deployment - unsupported volume type #129

Open kameshsampath opened 7 years ago

kameshsampath commented 7 years ago

When doing new deploys, typically when it takes long time for the pods to be created - you see the following error:

"2016-09-07 22:32:51 +0530 IST   2016-09-07 22:35:02 +0530 IST   11        gogs-1-1p11a   Pod                 Warning   FailedMount   {kubelet rhel-cdk}   Unable to mount volumes for pod "gogs-1-1p11a_default(e9593ef7-751c-11e6-a52e-525400825485)": unsupported volume type
2016-09-07 22:32:51 +0530 IST   2016-09-07 22:35:02 +0530 IST   11        gogs-1-1p11a   Pod                 Warning   FailedSync   {kubelet rhel-cdk}   Error syncing pod, skipping: unsupported volume type
2016-09-07 22:33:21 +0530 IST   2016-09-07 22:35:05 +0530 IST   9         nexus-1-pfdoj   Pod                 Warning   FailedMount   {kubelet rhel-cdk}   Unable to mount volumes for pod "nexus-1-pfdoj_default(fb04b43e-751c-11e6-a52e-525400825485)": unsupported volume type"

while the oc get pods -w shows as follows,

NAME                              READY     STATUS              RESTARTS   AGE
docker-registry-1-deploy          0/1       DeadlineExceeded    0          3h
docker-registry-2-t7gl6           1/1       Running             0          2h
exposecontroller-1-ugdby          1/1       Running             0          1h
fabric8-docker-registry-1-xula9   1/1       Running             0          1h
fabric8-forge-1-542av             1/1       Running             0          1h
fabric8-ovp6s                     1/1       Running             0          1h
gogs-1-1p11a                      0/1       ContainerCreating   0          2m
gogs-1-deploy                     1/1       Running             0          1h
jenkins-1-kn7h0                   1/1       Running             0          1h
nexus-1-deploy                    1/1       Running             0          1h
nexus-1-pfdoj                     0/1       ContainerCreating   0          1m
router-1-uwwlg                    2/2       Running             0          2h
rawlingsj commented 7 years ago

Can you provide some more details of the openshift envirinment you're deploying into? i.e. minishift with xyhve / virtualbox / bare metal / AWS / GKE etc

Also what's the output of oc get pv && oc get pvc?

kameshsampath commented 7 years ago

@rawlingsj I run on OSE v3.2 on CDK v2 on RHEL 7

the output of oc get pv;oc get pvc is as shown:

NAME                CAPACITY   ACCESSMODES   STATUS      CLAIM                       REASON    AGE
gogs-data           5Gi        RWO           Available                                         6m
gogs-repositories   5Gi        RWO           Available                                         6m
jenkins-jobs        5Gi        RWO           Available                                         6m
jenkins-workspace   5Gi        RWO           Available                                         6m
nexus-storage       5Gi        RWO           Available                                         6m
pv01                1Gi        RWO,RWX       Bound       default/gogs-data                     8m
pv02                2Gi        RWO,RWX       Bound       default/gogs-repositories             8m
pv03                3Gi        RWO,RWX       Bound       default/jenkins-jobs                  8m
NAME                STATUS    VOLUME              CAPACITY   ACCESSMODES   AGE
gogs-data           Pending   gogs-data           0                        6m
gogs-repositories   Pending   gogs-repositories   0                        6m
jenkins-jobs        Pending   jenkins-jobs        0                        6m
jenkins-workspace   Pending   jenkins-workspace   0                        6m
nexus-storage       Pending   nexus-storage       0                        6m
jstrachan commented 7 years ago

@kameshsampath this looks like the same OpenShift bug we've seen trying to use PVCs on OpenShift dedicated; where the PV looks bound but the PVC is Pending. https://bugzilla.redhat.com/show_bug.cgi?id=1370312

kameshsampath commented 7 years ago

@jstrachan i have see that many times now and usually I would prefer to delete and reattach once the pods are up. i remember you saying that for non minishift/minikube we dont create PV's automatically, in such cases shall provide few commands to fabric8 like,

gofarbric8 show attachable-pvs - this will list all the pv that can be bound w.r.t to fabric8 pods like jenkins, gogs etc.,

then the developer can do gofabric8 configure pv which can create pvs and pvcs and bind them to the respective pods ?

personally i feel doing it after pod creation will avoid such issues.

Please let me know your thoughts.

PS: the commands here are bit crude just to put across my thoughts, we can discuss further to refine it.

jstrachan commented 7 years ago

I agree we need more tooling to help folks get their PVs setup correctly.

@kameshsampath BTW when you say delete and reattach, you mean delete the PVCs? Or the PVs?

We shouldn't ever have to delete the PVCs; its more case that some PVs might not match so we need to create new ones (or delete bad ones etc) right?

A first step could be to add a little check to the gofabric8 validate command to list all the PVCs which are not yet bound, recommending users create new PVs.

In Kubernetes 1.4 there's gonna be dynamic PV provisioning which if enabled should just do the right thing and make the PVs (hopefully).

Until then we could have a command which would generate sample PV metadata that users can then configure & customize to choose their particular PV implementation they wish? Or maybe we could parameterize it a little to generate most of the required metadata or something?

In terms of finding the missing PVs the existing kubectl get pvc seems reasonable for diagnositcs but we could make it a bit more obvious what PVs are still required and why (e.g. which app needs them etc)

kameshsampath commented 7 years ago

@jstrachan, when i meant delete and reattach it meant delete pv/pvcs - typically doing a graceful delete, kind of scale down pods to 0, then delete pv/pvcs, create new pv/pvcs and then scale up the pods. Not sure how best this approach just a thought :)

on the validate step gofabric8 validate i totally with you, this validate step we could even help the users to ask if they want to auto create and attach which we can do on their behalf.. may be some kind of maven plugin commands if possible, which can allow user to edit the pv/pvcs as part of the maven project and do kind mvn fabric8:pv-attach

jstrachan commented 7 years ago

Did you find you had to recreate the PVCs? I thought the PVC could be left unchanged and you just delete and recreate the PVs if they were bad? Thats worked for me on minikube / minishift so far. Scaling down/up doesn't affect the PVCs / PVs really.

Though if the PV was created with a HostPath and wasn't' chmoded, then I've found you have to scale down & back up again to get the pods to be able to read/write to the volumes.

Note that PV creation depends on the ops side of things on kubernetes + openshift; so its not something a developer tool can/should mess with generally. The attaching is done automatically by kubernetes/openshift. All we can really do is help users setup the PVs really (e.g. helping them generate the manifest maybe); or ask the ops person who has karma on your cluster to setup some PVs for you. One day we hope to get kubernetes 1.4 with dynamic PV creation which should make this much less painful.