concourse / hush-house

Concourse k8s-based environment
https://hush-house.pivotal.io
29 stars 23 forks source link

hh: make use of local disks #12

Closed cirocosta closed 5 years ago

cirocosta commented 5 years ago

Hey,

Right now we're using a patch that makes everything ephemeral by only using emptyDir, which is ok, but that's been painful for workloads that require a lot of disk IO as the disk throughput ends up being shared between the various pods in the machine that make use of ephemeral storage.

It's also known that the pd-ssd that we use throttles us at a not very high bandwidth:

screen shot 2019-02-28 at 10 28 51 am

With the use of local SSDs we can have a much better perf, but that'd require us making https://github.com/helm/charts/pull/9668 more versatile in terms of having the option to use statefulsets (so we can leverage the local provisioner to give us access to the local SSDs via regular PVCs).

cirocosta commented 5 years ago

After giving a try to using local disks with the support of https://github.com/kubernetes-sigs/sig-storage-local-static-provisioner, it was quite clear that managing the lifecycle of nodes through autoscaling (you essentially can't - at least on GKE) and scheduling when VMs come and go (a pod that got a PVC that is tied to a node makes new instantiations of such pod to have a hard affinity on such node) became too hard.

After reading GCP's instructions on how to better optimize for the use of persistent disks (which can get re-mounted to other nodes at any time), I rolled back to using PD's (although at a much higher storage size). They're not as good as local disks, but it seems enough for now.