Open tdcox opened 8 years ago
Bah, pasting XML doesn't render properly. Above should read:
Setting the default volume size for a PV claim in the POM like so:
fabric8.defaultPersistentVolumeClaimRequestsStorage>50M</fabric8.defaultPersistentVolumeClaimRequestsStorage
OK, I think this is a combination of another bug and a poor error message. Setting defaultPersistentVolumeClaimRequestsStorage does appear to work if the value is less than the capacity of the PV. If the value is larger, the deployment fails with 'unsupported volume type', which is misleading, but could be expected to fail.
What has been throwing me is that once you have deployed a pod against a PV, it will fail to redeploy subsequently, also with the same unhelpful error 'unsupported volume type'. To get it to redeploy, you seem to have to delete the Pod, the PV and the PVC manually, then recreate the PV (it's NFS, so the data is still there). At this point, you can redeploy again.
So, looks like CI / CD is currently not possible with storage involved unless I'm missing something.
Indeed the error message is really not very helpful but it comes directly from OpenShift. Wonder whether we should open an issue there.
BTW, instead of setting the default claim size, in you example you could have set
<fabric8.volume.www.requestStorage>50M<fabric8.volume.www.requestStorage>
only for your specific volume (which is name www
in your example)
Thanks. Full POM is here if you wish to reproduce - it's just a Java archetype, tweaked.
https://gist.github.com/tdcox/de9caa18436d833c7b5b11923b29b3f0
I can confirm that the same symptoms occur when using 'requestStorage'.
Looks like this is also flooding the system log with the following every few seconds:
Apr 4 17:33:23 vagrant openshift: E0404 17:33:23.589436 952 persistent_claim.go:74] The volume is not yet bound to the claim. Expected to find the bind on volume.Spec.ClaimRef: &{TypeMeta:{Kind: APIVersion:} ObjectMeta:{Name:www GenerateName: Namespace: SelfLink:/api/v1/persistentvolumes/www UID:14ac6902-fa8a-11e5-91bd-323534663334 ResourceVersion:109875 Generation:0 CreationTimestamp:2016-04-04 17:24:26 +0000 UTC DeletionTimestamp:<nil> DeletionGracePeriodSeconds:<nil> Labels:map[] Annotations:map[]} Spec:{Capacity:map[storage:{Amount:1000000000.000 Format:DecimalSI}] PersistentVolumeSource:{GCEPersistentDisk:<nil> AWSElasticBlockStore:<nil> HostPath:<nil> Glusterfs:<nil> NFS:0xc22cc8bec0 RBD:<nil> ISCSI:<nil> FlexVolume:<nil> Cinder:<nil> CephFS:<nil> FC:<nil> Flocker:<nil> AzureFile:<nil>} AccessModes:[ReadWriteMany] ClaimRef:<nil> PersistentVolumeReclaimPolicy:Retain} Status:{Phase:Available Message: Reason:}}
Apr 4 17:33:23 vagrant openshift: E0404 17:33:23.590163 952 kubelet.go:1716] Unable to mount volumes for pod "nginx-test-vol-me5z9_default(3b795bdd-fa8a-11e5-91bd-323534663334)": unsupported volume type; skipping pod
Apr 4 17:33:23 vagrant openshift: E0404 17:33:23.590193 952 pod_workers.go:138] Error syncing pod 3b795bdd-fa8a-11e5-91bd-323534663334, skipping: unsupported volume type
Just tried to reproduce the issue with the stripped down yaml from below, however I fail.
@tdcox what type is your PV ? (hostPath ?) How did you create the PV ?
apiVersion: v1
kind: PersistentVolume
metadata:
name: www
spec:
capacity:
storage: 50M
accessModes:
- ReadWriteMany
persistentVolumeReclaimPolicy: Recycle
hostPath:
path: /tmp
---
apiVersion: "v1"
kind: "List"
items:
- apiVersion: "v1"
kind: "PersistentVolumeClaim"
metadata:
name: "www"
spec:
accessModes:
- "ReadWriteMany"
resources:
requests:
storage: "50M"
volumeName: "www"
- apiVersion: "v1"
kind: "ReplicationController"
metadata:
name: "nginx-test-vol"
spec:
replicas: 1
selector:
project: "nginx-test-vol"
group: "experiments"
template:
metadata:
labels:
project: "nginx-test-vol"
group: "experiments"
spec:
containers:
- image: "nginx"
name: "nginx-test-vol"
volumeMounts:
- mountPath: "/usr/share/nginx/html"
name: "www"
readOnly: false
volumes:
- name: "www"
persistentVolumeClaim:
claimName: "www"
readOnly: false
However, when I increase the claim to 100M I the type error:
oc describe pod nginx-test-vol-8d7yr
Name: nginx-test-vol-8d7yr
Namespace: default
Node: 172.28.128.4/172.28.128.4
Start Time: Mon, 04 Apr 2016 20:40:04 +0200
Labels: group=experiments,project=nginx-test-vol
Status: Pending
IP:
Controllers: ReplicationController/nginx-test-vol
Containers:
nginx-test-vol:
Container ID:
Image: nginx
Image ID:
Port:
QoS Tier:
cpu: BestEffort
memory: BestEffort
State: Waiting
Reason: ContainerCreating
Ready: False
Restart Count: 0
Environment Variables:
Conditions:
Type Status
Ready False
Volumes:
www:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: www
ReadOnly: false
default-token-1bb89:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-1bb89
Events:
FirstSeen LastSeen Count From SubobjectPath Type Reason Message
--------- -------- ----- ---- ------------- -------- ------ -------
23s 23s 1 {default-scheduler } Normal Scheduled Successfully assigned nginx-test-vol-8d7yr to 172.28.128.4
23s 2s 3 {kubelet 172.28.128.4} Warning FailedMount Unable to mount volumes for pod "nginx-test-vol-8d7yr_default(a5482050-fa94-11e5-ba36-080027b5c2f4)": unsupported volume type
23s 2s 3 {kubelet 172.28.128.4} Warning FailedSync Error syncing pod, skipping: unsupported volume type
After some research there are several issue open related to this bogus error message:
The periodic log file entries are due to periodically retrying to fulfill the PVC so I consider this to be the normal behaviour.
If you don't mind I would like to close this issue and I recommend to track the issues above for a fix which eventually will included in OpenShift, too.
PV is created using something like this:
apiVersion: v1
kind: PersistentVolume
metadata:
name: www
spec:
capacity:
storage: 1G
accessModes:
- ReadWriteMany
nfs:
path: /share/pv
server: 192.168.50.2
persistentVolumeReclaimPolicy: Retain
then executing with oc create -f pv.yml
.
I'm using NFS in these tests.
I concur that the issues above are sufficient to cover the problem of obfuscated error messages, however I don't see anything in those relating to the fact that we are seemingly unable to re-use a PV marked as 'retain' when we redeploy a fabric8 app. That's somewhat of a showstopper for fabric8-devops, so I would hope that there were some ongoing issue tracking for this problem?
Ah sorry, got lost about the real issue. 'will continue tomorrow ....
sorry ...
No problem. Looking at the code, it appears that if you don't mark the claim as Read Only, it will be created as ReadWriteMany. This seems to mean that you can't use ReadWriteOnce volumes from the POM, but more interestingly, implies that the PV should support multiple containers binding to it. If that were the case, a new instance of a pod should bind straight to the volume, regardless of any existing binding?
Reading through the OpenShift docs again, it seems that 'retain' is described as 'manual recycling'. I can't see any description of a use case where a pod is undergoing a rolling upgrade and passing a PV on to a subsequent instance. I guess you could make arguments both for and against that scenario, so we should probably establish if this is a bug, or a prohibited activity to start with.
@tdcox your observation is correct, if the PVC is not marked as ReadOnly
its configured as ReadWriteMany
tbh, I don't even know all the possible modes for a PVC but will read this up and come back :)
As a test, I've just created a second project that attempts to connect to the same PV in parallel. As expected, this fails with the same error.
Ah, but I do get a different log entry with a file name and line number:
kubelet.go:1910] volume "fb8cd148-fa8f-11e5-91bd-323534663334/www", still has a container running "fb8cd148-fa8f-11e5-91bd-323534663334", skipping teardown
@tdcox just learned a bit about how PVs and especially PVCs work.
The current plugin's design has the issue that it allows you to configure how a PVC is mounted into a Pod (readOnly or not via PersistentVolumeClaimVolumeSource
) and then infers from that how a PVC is created. However there is no config information how the accessModes
should look like. The plugin maps readOnly
on the mount option to ReadOnly
and ReadWriteMany
access modes of the PVC.
However according to the source the following modes are available:
type PersistentVolumeAccessMode string
const (
// can be mounted read/write mode to exactly 1 host
ReadWriteOnce PersistentVolumeAccessMode = "ReadWriteOnce"
// can be mounted in read-only mode to many hosts
ReadOnlyMany PersistentVolumeAccessMode = "ReadOnlyMany"
// can be mounted in read/write mode to many hosts
ReadWriteMany PersistentVolumeAccessMode = "ReadWriteMany"
)
So it seems that ReadOnly
doesn't even exist (should probably be ReadOnlyMany
).
For non-readOnly PVCs I wonder whether the plugin
ReadWriteOnce
ReadWriteMany
Also as there is no specification of the reclaim policy (which is Retain
by default but could also be Recycle
or Delete
) and as @tdcox I wonder whether the default is the proper choice.
At the end I question whether managing of PVC should be really the task of f8-m-p or maybe better done by external tooling (gofabric8 for DevOps). (Of course dealing with PersistentVolumeClaimVolumeSources
for Volumes attached to Pods remains part of f8-m-p's business).
@jdyson @jstrachan @rawlingsj wdyt ?
I believe that not all possible storage providers will be capable of fulfilling all Access Modes, so it may be necessary to specify the mode directly.
Note also that 'Many' connections only apply to containers in the same namespace.
I think the important questions we need to know answers for are:
There is a continuum of potential answers that ranges from trying to solve all problems in the data management domain, to declaring that fabric8 is for stateless functions only. It would be helpful to understand where the current thinking has reached on this.
@jdyson @jstrachan @rawlingsj
Setting the default volume size for a PV claim in the POM like so:
Results in a JSON entry like:
However the deployment fails with:
There exists a matching PV like so:
Full JSON here: https://gist.github.com/tdcox/3fd826045c95513bdb905797026b35d9