dandi / dandi-hub

Infrastructure and code for the dandihub
https://hub.dandiarchive.org
Other
11 stars 23 forks source link

Check clean up of EBS volumes #128

Open satra opened 8 months ago

satra commented 8 months ago

the hub seems to not clean up ebs volumes associated with pods. verify that when a pod is deleted, the corresponding EBS volume is removed.

asmacdo commented 8 months ago

Not sure if I understand correctly-- does this mean we do not need persistent volumes for the users?

satra commented 8 months ago

in the current hub the user persistent volumes come from EFS not EBS. i believe EBS is more expensive than EFS.

CodyCBakerPhD commented 8 months ago

i believe EBS is more expensive than EFS.

EFS is more expensive in terms of amount per unit time (assuming standard billing at $0.30/GB-month), but is only billed for total instantaneous usage over time

EBS is technically cheaper and has a lot more options for optimizing I/O speeds (let's just use $0.08/GB-month as baseline), but it has to be pre-allocated at a certain size on spawn of the instance or mount of the volume (I don't know what/how the Hub handles that) and you get billed for that full amount regardless of how much disk space the user actually utilizes on the volume over that time

So which is more expensive depends on (i) Hub configuration settings and (ii) user behavior

The way Luiz and I usually answer the question of which is more expensive for a non-theoretical application is to look at the billing information after usage, which delivers total summary amounts for each approach - is that available in this case?

satra commented 7 months ago

@CodyCBakerPhD - just to make sure we are comparing the costs as it stands, in dandi hub right now, we use single zone, infrequent-access for the EFS volume, which ends up being lower than EBS.

satra commented 7 months ago

however EBS can have quota's since K8s can provision them on the fly and attach to a pod. while EFS, currently cannot have quota's at least in the traditional sense. one could technically run an NFS service to then map EFS with quotas to the pods.

CodyCBakerPhD commented 7 months ago

just to make sure we are comparing the costs as it stands, in dandi hub right now, we use single zone, infrequent-access for the EFS volume, which ends up being lower than EBS.

Ahh OK thanks for clarifying. Single zone definitely makes sense

Seems the biggest difference between the types as advertised is latency and throughput speeds. But I haven't run benchmarks for read/write speeds between standard and infrequent EFS, but hope to get answers on that sometime this year as a part of the NWB Benchmark projects

Is the /shared mount the same type of single-zone infrequent-access EFS?

satra commented 7 months ago

Is the /shared mount the same type of single-zone infrequent-access EFS?

yes

asmacdo commented 7 months ago

Is the /shared mount the same type of single-zone infrequent-access EFS?

yes

https://github.com/dandi/dandi-hub/blob/do-eks/helm/jupyterhub/dandihub.yaml#L183-L199

asmacdo commented 7 months ago

I deleted old staging-volumes that are no longer attached to resources. Next time we bring the whole thing down and backup we need to check if this is still happening, leaving open.

kabilar commented 6 months ago

Notes from today's discussion to investigate:

  1. What data is stored in the EBS volumes?
  2. Attach to previous EBS volume if bringing the cluster up.
  3. Clean up old DoEKS EBS volumes.
asmacdo commented 5 months ago

So far so good on the BICAN side, (I am not seeing any EBS costs)

image

kabilar commented 5 months ago

Thanks @asmacdo. Just a heads up that EC2-Other actually includes EBS volumes. There is a way of understanding the EBS portion of the EC2-Other category by filtering by Usage Type. See AWS blog.

kabilar commented 5 months ago

We are working to understand the source of these volumes:

image

kabilar commented 4 months ago

The above volumes are created for the JupyterHub databases. Austin has cleaned up all sandbox volumes. Will need to document the process to clean up volumes after creating sandboxes so keeping this ticket open.