Open artem-zinnatullin opened 3 years ago
Hello,
We are considering adding support for dynamic provisioning of local storage volumes in DOKS, however it likely will not be implemented in this CSI driver.
The significant caveat to using node-local NVMe/SSD storage is that it is indeed node-local - we can't detach it from one node and attach it to another. This means it's really only useful for ephemeral purposes, since we expect nodes to be replaced in the course of normal cluster operations (e.g., due to health or for upgrade).
If you're able to share, I'd be interested to hear more about your use-case for local storage. We can connect over email if you'd rather discuss privately.
Thanks!
cc @bikram20
We are considering adding support for dynamic provisioning of local storage volumes in DOKS
That's great news!
however it likely will not be implemented in this CSI driver.
Interesting, how it'd be exposed and mounted then?
The significant caveat to using node-local NVMe/SSD storage is that it is indeed node-local - we can't detach it from one node and attach it to another. This means it's really only useful for ephemeral purposes, since we expect nodes to be replaced in the course of normal cluster operations (e.g., due to health or for upgrade).
We do understand this caveat. There are cases when it's fine, we want to run distributed Database on NVMe storage and distributed object store. Due to performance requirements we do want to use NVMes that DigitalOcean offers. In our case the applications are distributed meaning that a Node shutdown for say upgrades and is fine since other nodes will act as replicas, this is achieved via nodeAffinity
rules in the app deployment so that pods of these apps are not running on same nodes that already have them running.
If you're able to share, I'd be interested to hear more about your use-case for local storage. We can connect over email if you'd rather discuss privately.
Let's continue publicly in this issue, there are very little public discussions on this topic so I'd like to use this thread as an opportunity to add more information on using local NVMe drives with Kubernetes to internet :)
We are considering adding support for dynamic provisioning of local storage volumes in DOKS
That's great news!
however it likely will not be implemented in this CSI driver.
Interesting, how it'd be exposed and mounted then?
We would add an additional StorageClass with a separate provisioner, potentially leveraging an existing project like the direct-csi
driver you linked. There's nothing DO-specific about node-local storage, so no need to add it to the DO CSI driver.
Sounds good!
Submitted related issue on partitioning NVMe drives for DOKS nodes https://github.com/digitalocean/DOKS/issues/27, basically we can't repartition NVMe drive right now..
This sort of provisioning is also useful for running your own database workloads on nodes if you need something with the local nVME performance. Yes, the storage is 'ephemeral', but that is something database management tools like zalando or stolon can take into account, especially when combined with things like pod disruption budgets.
You can implement solutions for that need today by running self-managed k8s clusters alongside a managed one, but the administration workload also multiplies accordingly in that case. Managed DOKS as of 1.20 at least is almost there with the ability to run your so1.5* plan node pools. If you offered a way to allow a node pool to upgrade in-place, an operator needing to run a local datastore could run it entirely in managed DOKS.
In my particular usecase, I have clients who need to run PostgreSQL services with custom extensions and replication patterns, so that disqualifies most managed SQL offerings as well, thus my interest in closing the feature gaps in managing ephemeral storage on cloud instances/droplets.
Hm. Vultr has been doing NVMe for a while as default for their Managed Kubernetes solution. This is a big difference with no additional cost.
@kallisti5 What kind of workloads are you looking to run on NVMe local storage? Would you be okay with ephemeral nodes? Nodes are recycled during release upgrade.
@bikram20 Overall I'm trying to find a cost-effective way to leverage the standard DO instance sizes.
Running a reliable ReadWriteMany storage model is pretty difficult at Digital Ocean. My solution was longhorn storage (https://longhorn.io) since it maintains and grooms RWX replicas between all of the kubernetes nodes directly (using the massive amount of wasted space on each k8s node pool droplet saving costs (the 4vcpu / 8GiB nodes have over 100GiB which will go unused for most people using do's csi)). it also automatically backs up data to s3.
NVMe though would probably be the minimum requirement to maintain replicas within a reasonable timeframe.
DO really needs a managed storage solution that can do RWX like Gluster or NFS.
The workload itself is 300 GiB+ of software packages for Haiku (https://haiku-os.org) plus some other infrastructure.
For others that are interested, a potential workaround is to mount file containers. Here's an example (original source):
You can then use https://github.com/kubernetes-sigs/sig-storage-local-static-provisioner or any other local volume "provisioner" like normal.
The above DaemonSet creates a sparse file by default. To instead reserve the amount of space specified, try a syntax like dd if=/dev/zero of="${img_path}" bs=1M count=${img_size_mb}
instead of dd if=/dev/zero of="${img_path}" bs=1024 count=0 seek=10485760
.
I benchmarked this on s-2vcpu-4gb-120gb-intel
with a non-spare file container mounted. Here's the fio config:
[read]
direct=1
bs=8k
size=1G
time_based=1
runtime=240
ioengine=libaio
iodepth=32
end_fsync=1
log_avg_msec=1000
directory=/data
rw=read
write_bw_log=read
write_lat_log=read
write_iops_log=read
and here's the results:
Storage Class | IOPS | BW |
---|---|---|
Local File Container | ||
Digital Ocean Block Storage |
The block storage benchmarks match what is currently listed on the Limits page (7500 IOPS * 8k blocksize = 60MB/s).
If you're able to share, I'd be interested to hear more about your use-case for local storage. We can connect over email if you'd rather discuss privately.
Not OP, but I'm interested in this for use with CloudNative-PG as an alternative to Managed Databases (we have different RPO requirements).
For what it's worth, here's our rudimentary pgbench results on CloudNative-PG using the above local file container vs managed database:
DB | Init | Select | RW |
CloudNative-PG | ``` done in 356.09 s (drop tables 0.00 s, create tables 0.02 s, client-side generate 212.51 s, vacuum 10.92 s, primary keys 132.64 s). ``` |
```
pgbench (16.3 (Debian 16.3-1.pgdg110+1))
starting vacuum...end.
transaction type: |
```
pgbench (16.3 (Debian 16.3-1.pgdg110+1))
starting vacuum...end.
transaction type: |
---|---|---|---|
Managed (1x s-4gb-2vcpu) | ``` done in 295.82 s (drop tables 0.00 s, create tables 0.00 s, client-side generate 187.63 s, vacuum 1.59 s, primary keys 106.59 s). ``` |
```
pgbench (16.3 (Debian 16.3-1.pgdg110+1))
starting vacuum...end.
transaction type: |
```
pgbench (16.3 (Debian 16.3-1.pgdg110+1))
starting vacuum...end.
transaction type: |
Hi!
We're looking for an automated way to provision
PersistentVolumeClaim
s against locally mounted NVMe drives on DigitalOcean https://www.digitalocean.com/blog/introducing-storage-optimized-droplets-with-nvme-ssds/We've tried
local
StorageClass https://kubernetes.io/docs/concepts/storage/storage-classes/#local, it does work however it is not automated at all, unlike DO Block Storage in k8s:PerstistentVolume
sPersistentVolume
has to be constrained to a particular node withnodeAffinity
PersistentVolume
has to have capacity manually defined, however it does not act as a limit sinceNVMe
storage is mounted as root/
filesystem on Premium and Storage Optimized Droplets with NVMePersistentVolume
must have only one assosiatedPersistentVolumeClaim
otherwise Pods using it will not be scheduledWe're looking into CSI implementations like https://github.com/minio/direct-csi, however major blocker there is that it only works with additional (non-root
/
) disks, but DigitalOcean Premium droplets use NVMe drive as root/
.The question is: can you consider adding support for DigitalOcean NVMe drives to csi-digitalocean please? :)
Thanks!