berkeley-dsep-infra / datahub

JupyterHubs for use by Berkeley enrolled students
https://docs.datahub.berkeley.edu
BSD 3-Clause "New" or "Revised" License
62 stars 37 forks source link

Define Policy/Process for Hub Storage Quota #4447

Open balajialg opened 1 year ago

balajialg commented 1 year ago

Summary

@felder, @ryanlovett, and I had a good discussion about issue #4414 where I observed that a few users in the biology hub stored large amounts of data forcing us to provision a higher file store tier for the hub. Our tentative estimate revealed that around 10 out of 800+ users stored around 3 TB worth of data in the biology hub (overall storage consumed in the biology file store tier was around 4.1 TB). Investigation into type of files stored in some of these users revealed that they had large .fastq file. .Fastq files are data files used in bio sequencing.

We are paying close to $1700 per month for the biology file store tier which amounts to around $20,400 per year.

We can definitely save $$$ at the order of few thousands by having an effective policy + process to handle exceptions. So, this is prompting us to decide on a policy and process to handle such exceptions currently and also avoid this scenario in the future.

Some of the options floated were,

User Stories

Important information

Tasks to complete

felder commented 1 year ago

See https://jira-secure.berkeley.edu/browse/DH-107