tanmaykm commented 8 years ago

There's a subtle bug in JuliaBox that can cause loss of data when the user's disk storage is near the alotted quota.

JuliaBox today needs to create and/or update certain files in the user's home folder:

.bashrc to setup certain paths and aliases for Julia
.gitconfig, auto generated for the user
.ssh, auto generated for the user
.ipython to setup IPython kernels and certain options to make it work correctly on JuliaBox. Can also contain IPython log files.
.juliabox to setup JuliaBox configuration files. Can also contain JuliaBox log files.
the Julia tutorial link and associated notebooks

To restore the user's data:

a blank disk is first primed with the above files
data from user's backup are applied on it
some files are updated (I think only .bashrc)

Steps 2 and 3 above can fail if the storage used in step 1 is more than what it was when the user backed up their data last. That can happen between JuliaBox releases. And when the incompletely restored data is backed up, it overwrites the last good backup.

So, I think the primary reasons for this issue is JuliaBox having to share disk space with user data. Keeping more than one backup will help restore data, but that's an added safety feature. Below are a few thoughts to address this:

Separate mounts for user data and user home (larger data volume at /data, smaller volume at /home/juser)
- user home is not backed up
- data disk gets backed up
- link some essentials from user home to mounted volume (e.g. .bashrc, .ssh, ...)
- :+1: Faster boot time, as /data can be restored in async
- :+1: User (with appropriate privilege) can keep multiple disks and choose one to mount at run time
- :+1: Going by this analogy EBS volumes can be mounted at /data/disk-1 and such
- :-1: Inconvenience of having a small home folder
- :-1: Can leave some file on the home folder and forget that it is ephemeral
Separate mounts for user data and user home (larger data volume at /data, docker container filesystem for /home/juser)
- similar to above, except that /home/juser is not a separate mount
- :+1: simpler in operation as one less mount point
- :-1: can't enforce a limit on home folder size
- :-1: container filesystem is slower, so writing to home folder will be slower
Allocate single disk of size (quota + (reserved space for JuliaBox use)), enforce quota someway
- reserve large enough additional space for JuliaBox use
- enforce by alerting user after periodic checks and during backup
- :-1: no easy way to have the OS enforce limits
- :-1: not clear how often should JuliaBox be monitoring space usage
- :-1: not clear what to do if the user ignores quota messages or session gets disconnected
Use linux quota with docker user namespaces
- have a virtual uid on host system for each docker container, and map it to juser in the corresponding container
- setup quota on host machine for the virtual uid to control allocation
- :+1: simpler, no need to use loopback volumes
- :-1: the real uid can become different on each log in, difficult to manage file permissions.

Approach 1 or 2 looks the best to me. Any other ideas are welcome.

tanmaykm commented 8 years ago

421 fixes this to a good extent. But the `loopback` disk plugin with object store backup still has practical size limitation.

Using GlusterFS with the JuliaBox hostdisk plugin could be a good solution for large amounts of reliable and quite fast data storage. I found it good and responsive when tried on a small test setup. GlusterFS also supports folder level quotas and user serviceable snapshots.

The AWS equivalent EFS is easier to provision and manage. It does not have snapshots or folder level quota and access is restricted from AWS-VPC.

It will be great to hear experiences from anyone who has used GlusterFS/EFS.

tanmaykm commented 8 years ago

Summarizing the possible storage types:

disk type	plugin	attach	IO	cost	size
local disk	`hostdisk`	fast	fastest	low	large
object store (S3, GCS)	`loopback`	slow	fastest	low	small
block store (EBS)	`vol_ebs`	slow	fast	high	very large
network disk (GlusterFS, EFS)	`hostdisk`	fast	fast	high	unlimited

GlusterFS/EFS can be mounted on multiple instances/containers (useful in sharing or running distributed applications).

ViralBShah commented 8 years ago

cc @aviks

JuliaCloud / JuliaBox

RFC: Prevent data loss due to disk quotas #390

421 fixes this to a good extent. But the `loopback` disk plugin with object store backup still has practical size limitation.

JuliaCloud / JuliaBox

RFC: Prevent data loss due to disk quotas #390

421 fixes this to a good extent. But the loopback disk plugin with object store backup still has practical size limitation.

421 fixes this to a good extent. But the `loopback` disk plugin with object store backup still has practical size limitation.