Open jdimeo opened 5 years ago
This is tricky. From what I've seen, rsync
does not work with cloud storage providers. There is a similar application called rclone
, but it only works in the local --> cloud
direction. Without setting up an on-premises file storage gateway, the only other viable alternative to sync files to and from s3 is s3fs-fuse. That has it's own issues though; chief among them being that files are owned by root (0755), with other users allowed read/write operations.
As discussed in the lengthy slack thread, we would want local caching of files in order to reduce latency. This can be accomplished with s3fs-fuse
. However, the cache size is unbounded. We would have to manage the cache size either with a size-quota or periodic (read: cron) purge.
Moving this to ElderResearch/gpu_launch_app. It seems that implementation would be through the app.
Was the file @oersoy1 wanted to move to the GPU box too big to use scp
? Or was it annoying to use scp
from the command line?
If the latter, I feel like the solution is for people to use an SFTP GUI like WinSCP, Cyberduck, or Fugu to move files between their local machines and the /data
directory on the box.
Not sure what was wrong with scp
. I actually kinda like the idea of uploading data through the launchapp. That would let us more easily manage and enforce the metadata requirements for the data-browser.
Ozan wanted to upload a large data file to the box but didn't want to mess with SCP. What if we set up a place that's easy to access, like an S3 bucket, where if you drop a file in there, it will get replicated to the
/data
folder that is already mounted as part of every GPU session.One possibility is to mount the S3 bucket to root box, then have a util like
rsync
watch that dir and copy new files to the/data
dir.Extra credit to set permissions correctly so this folder is read-only and it's clear to users you shouldn't/can't write stuff there, it's a replication of the S3 folder.
Extra credit if there was a way to upload a file to this S3 bucket from the GPU dashboard/app. So I could upload my data file and start my session/container, and in a few moments, there it would be.