Closed zonca closed 4 years ago
Implemented in #15, we will keep this open for now to report any feedback / issues.
@pibion who is testing data access can post feedback here or open a dedicated issue.
It's currently @ziqinghong and @blackholesun137. I typically relay comments on our collaboration Slack to this thread, but you're both also welcome to post here directly!
@zonca I just successfully copied a bunch of text data over. Seems working well. The next step of the process is to read it out from jupyterlab, which I failed. The default umask put the files to 2710, with user:group=root:root. What's the best way of getting around this?
@zonca @ziqinghong at SLAC we use ACL permissions to make data sets readable by everyone in a "dark matter" group.
But you should probably only have write privileges in your own directory - where are you trying to write your data to?
@ziginghong unless you're doing something that creates data that should be stored in /cvmfs/data
which now that I think about it is entirely possible. Is that the case?
@ziqinghong try again now, just copying 1 file at the root of /cvmfs/data/
and check its permissions
@pibion I was scping stuff from SLAC to jetstream, where the username was root. @zonca I tried again, it still got written as 2740 with root:root. It's the 20200318_1554_time.png file in /cvmfs/data that I last tried.
@ziqinghong are you using scp
?
Yes. scp 20200318_1554_time.png XSEDE:/cvmfs/data/. where XSEDE points to the js node in .ssh/config
I just copied with scp
and my file has correct permissions:
-rw-r--r--. 1 root root 87 Apr 7 02:29 README.md
Here's what's in .ssh/config Host XSEDE HostName js-156-119.jetstream-cloud.org User root Port 30022 IdentityFile ~/jupyterhub-deploy-kubernetes-jetstream-secrets/ssh/cdms_nfs_ssh_key
what is the umask of the source file?
It's 2770 permission.. Maybe that's the difference? It needs to be un-readable on the SLAC cluster as internal data...
then ssh into the CVMFS pod after you have copied the files and recursively fix the permissions.
once you have done that, please add this note to the documentation at:
https://github.com/pibion/jupyterhub-deploy-kubernetes-jetstream-secrets#copying-data
so other users can easily get around this issue.
Will do.
@zonca Any chance we can have rsync? That's helpful if we want to incrementally syncing data, plus it has a flag to set the permissions. Thanks!
@ziqinghong yes, installed rsync, added example usage at https://github.com/pibion/jupyterhub-deploy-kubernetes-jetstream-secrets/blob/master/README.md feel free to augment it.
Awesome! Now it's much easier to setup a crontab for automated data transfer. (We won't do that yet, but soon...)
you are root on that pod, so feel free to install anything else you need,
just also add it to the Dockerfile
(make a pull request) at https://github.com/zonca/docker-cvmfs-client/blob/master/cvmfs-client-nfs/Dockerfile so that it is automatically included when we redeploy.
Hmmm.... I somehow forgot I was root there.......... Stockholm syndrome for not having root on servers I don't manage...
it is just a pod, so whatever happens I'll just redeploy it!
This seems to be working fine. I'll close this issue. Please open another issue if anything about he data store stops working.
Conclusion of #8 is that object store is not suitable.
Other 2 options are:
1) One option could be to use Manila on Jetstream which provides a NFS service which is handled by Openstack so we don't have to manage it. This provides a standard read/write filesystem we can mount on all pods.
2) Or deploy our own NFS server, actually we can probably use the NFS server we use for CVMFS to also serve this 50GB volume read/write.
I have never used Manila before, so I would rather use our own NFS server, we can later do some benchmarks.
So the plan is to have 1 pod which mounts a large volume read-write, and expose the SSH port with certificate-only access to copy there the data with rsync. Down the road we could deploy a Globus endpoint. Then this pod has a NFS server which shares the data as read-only to the Jupyter Notebook pods.
I haven't decided yet if this should be a standalone pod or the same pod of CVMFS, I'll track progress in this issue.