Closed smileisak closed 4 years ago
@smileisak Could you check how many azure disks are attached in your VM? Recently we found a host cache setting issue when there are more than 5 azure data disks, the disk may become unavailable, below are the details: https://github.com/andyzhangx/demo/blob/master/issues/README.md#2-disk-unavailable-after-attachdetach-a-data-disk-on-a-node
@andyzhangx Is there someway to actually prevent the cluster from scheduling more than 5 data disk mounts onto a specific VM?
In numerous cases I am seeing our clusters unfortunately schedule all the pods that need mounts onto the same machine, thus exceeding not only 5 mounts as you suggest but attempting to exceed the 8 data disk max that the specific VM itself can support.
@andyzhangx Also, I couldn't find anything in the docs you linked that referenced issues related to "5 data disks" or anything along those lines.
Were you referring to the caching mode issues in the section entitled "2. disk unavailable after attach/detach a data disk on a node" (where the workaround is to set cachingmode: None explicitily)?
@peskybp correct, you should set cachingmode as None. details: https://github.com/andyzhangx/demo/blob/master/issues/azuredisk-issues.md#2-disk-unavailable-after-attachdetach-a-data-disk-on-a-node
About the maximum data disks support, there is a design proposal in k8s community already: Add a design proposal for dynamic volume limits, should be fixed in v1.11
Even without my proposal - for AzureDisk type it should be possible to set limit in latest k8s release. Can you try setting KUBE_MAX_PD_VOLS
env variable to something like 5 and restart scheduler?
Fortunately scheduler already ships with some builtin intelligence for AzureDisks - https://github.com/kubernetes/kubernetes/blob/master/pkg/scheduler/algorithm/predicates/predicates.go#L106 and you probably don't need to wait for new design.
@gnufied thanks for the solution!
Is this a request for help?:
this a BUG REPORT :
Orchestrator and version (e.g. Kubernetes, DC/OS, Swarm)
What happened: I'm using Kubernetes cluster for my production. I have some issues with Persistance using
PersistantVolumeClaims
by provisioningPersistantVolumes
with defaultStorageClass
.I have all my databases (all pods using
persistantVolumes
) thatCrashloopBackoff
after a period of time. When logging, it looks like the pv mount changes to ReadOnly.Exemple of logs for mongodb
This happen with all pods that uses
persistantVolumeClamis
What you expected to happen:
All Azure disk should be mounted as ReadWrite and don't change during runtime.
How to reproduce it (as minimally and precisely as possible): Create a persistant volume claim from Default StorageClass and mount it in a pod. The time to reproduce this problem is random.
Anything else we need to know: