Open rabernat opened 2 years ago
Definitely agreed - this is a missing piece of our "estimating costs, and considerations for costs" docs. We started with some improvements on that in https://github.com/2i2c-org/docs/pull/132 but that PR is missing much detail on user data (and data cost considerations in general).
@yuvipanda does the DataHub document this anywhere? If so maybe we could start with that as a base and modify as needed for the 2i2c docs? Then we can continue to improve it over time.
Or, if somebody wants to throw down a few quick bullet points I am happy to turn them into docs
Context
Users coming from laptops are used to knowing how much space is on their hard disk. Users coming from HPC environments are used to having quotas on their home directory size. Users on cloud JupyterHubs are confused about how much storage space they have in their home directories. This issue is not addressed by the 2i2c docs on data management.
In a similar vein, hub owners may not understand the technical implementation and economic implications of data storage in home directories. For example the LEAP executive committee is very concerned about data stored in user home directories becoming a cost liability. I have been trying to argue that this is not a major concern, but I don't have much evidence or documentation to back up this claim. My understanding, likely incorrect / incomplete, is that 2i2c home directories are stored on an NFS shared volume which is backed by persistent disk. I do no know the following:
Proposal
We should augment the documentation at https://docs.2i2c.org/en/latest/admin/howto/data.html to address these issues. We should seek to provide answers to the following questions:
Updates and actions
No response