ElderResearch / gpu_launch_app

GPU container launch application
1 stars 1 forks source link

Create a data "dropbox" to easily copy data to GPU box #5

Open jdimeo opened 5 years ago

jdimeo commented 5 years ago

Ozan wanted to upload a large data file to the box but didn't want to mess with SCP. What if we set up a place that's easy to access, like an S3 bucket, where if you drop a file in there, it will get replicated to the /data folder that is already mounted as part of every GPU session.

One possibility is to mount the S3 bucket to root box, then have a util like rsync watch that dir and copy new files to the /data dir.

Extra credit to set permissions correctly so this folder is read-only and it's clear to users you shouldn't/can't write stuff there, it's a replication of the S3 folder.

Extra credit if there was a way to upload a file to this S3 bucket from the GPU dashboard/app. So I could upload my data file and start my session/container, and in a few moments, there it would be.

semperstew commented 5 years ago

This is tricky. From what I've seen, rsync does not work with cloud storage providers. There is a similar application called rclone, but it only works in the local --> cloud direction. Without setting up an on-premises file storage gateway, the only other viable alternative to sync files to and from s3 is s3fs-fuse. That has it's own issues though; chief among them being that files are owned by root (0755), with other users allowed read/write operations. As discussed in the lengthy slack thread, we would want local caching of files in order to reduce latency. This can be accomplished with s3fs-fuse. However, the cache size is unbounded. We would have to manage the cache size either with a size-quota or periodic (read: cron) purge.

semperstew commented 5 years ago

Moving this to ElderResearch/gpu_launch_app. It seems that implementation would be through the app.

enmyj commented 5 years ago

Was the file @oersoy1 wanted to move to the GPU box too big to use scp? Or was it annoying to use scp from the command line?

If the latter, I feel like the solution is for people to use an SFTP GUI like WinSCP, Cyberduck, or Fugu to move files between their local machines and the /data directory on the box.

semperstew commented 5 years ago

Not sure what was wrong with scp. I actually kinda like the idea of uploading data through the launchapp. That would let us more easily manage and enforce the metadata requirements for the data-browser.