Open tanmaykm opened 8 years ago
loopback
disk plugin with object store backup still has practical size limitation.Using GlusterFS with the JuliaBox hostdisk
plugin could be a good solution for large amounts of reliable and quite fast data storage. I found it good and responsive when tried on a small test setup. GlusterFS also supports folder level quotas and user serviceable snapshots.
The AWS equivalent EFS is easier to provision and manage. It does not have snapshots or folder level quota and access is restricted from AWS-VPC.
It will be great to hear experiences from anyone who has used GlusterFS/EFS.
Summarizing the possible storage types:
disk type | plugin | attach | IO | cost | size |
---|---|---|---|---|---|
local disk | hostdisk |
fast | fastest | low | large |
object store (S3, GCS) | loopback |
slow | fastest | low | small |
block store (EBS) | vol_ebs |
slow | fast | high | very large |
network disk (GlusterFS, EFS) | hostdisk |
fast | fast | high | unlimited |
GlusterFS/EFS can be mounted on multiple instances/containers (useful in sharing or running distributed applications).
cc @aviks
There's a subtle bug in JuliaBox that can cause loss of data when the user's disk storage is near the alotted quota.
JuliaBox today needs to create and/or update certain files in the user's home folder:
.bashrc
to setup certain paths and aliases for Julia.gitconfig
, auto generated for the user.ssh
, auto generated for the user.ipython
to setup IPython kernels and certain options to make it work correctly on JuliaBox. Can also contain IPython log files..juliabox
to setup JuliaBox configuration files. Can also contain JuliaBox log files.tutorial
link and associated notebooksTo restore the user's data:
.bashrc
)Steps 2 and 3 above can fail if the storage used in step 1 is more than what it was when the user backed up their data last. That can happen between JuliaBox releases. And when the incompletely restored data is backed up, it overwrites the last good backup.
So, I think the primary reasons for this issue is JuliaBox having to share disk space with user data. Keeping more than one backup will help restore data, but that's an added safety feature. Below are a few thoughts to address this:
/data
, smaller volume at/home/juser
).bashrc
,.ssh
, ...)/data
can be restored in async/data
, docker container filesystem for/home/juser
)/home/juser
is not a separate mountuid
on host system for each docker container, and map it tojuser
in the corresponding containeruid
can become different on each log in, difficult to manage file permissions.Approach 1 or 2 looks the best to me. Any other ideas are welcome.