Open tom-butler opened 4 years ago
It should be possible to configure default path that gets displayed in the file browser first time user logs in, by generating default config for jupyterlab workspace:
But I would not re-write it afterwards.
I would like to see the repo in a subdirectory rather than the root of the user's home directory.
Is there any reason not to?
Motivation
We often have new starters and visiting collaborators, and these new sandbox users have a huge learning curve: simultaneously learning git, unix and python. I think it is bad that the first thing we instruct them is to create a repo inside another repo, since this adds further inception-esque confusion, and is a widely discouraged git practice (i.e. git already interprets repo subdirs in a special way; it is asking for trouble, and complicates learning how git will behave). Already their beginner-mistakes routinely lead to trying fairly serious git kung fu (e.g. resetting to past states, rewriting objects out of histories, etc) to un-break everything and recover their work.
I think we should instead encourage a best-practice workflow, that is as close as possible to what we want them to follow on other linux platforms like NCI (avoiding features specific to DEA-sandbox such as touch .nosync
-- plenty of new users can't even discriminate between bash, git, python, and DEA-specific syntaxes).
Preference
I think we should have the repo pre-populated in ~/examples/
. If it can be, fast forward it on log-in. If the user has dirtied it, leave it for the user to manage. (This is literally the first skill everyone learns with git anyway.)
Instruct all new users to create a different subdirectory, and there establish their own branch repo (but don't do it for them).
I don't even think it is necessary to have jupyter pre-navigate to the examples directory; I see no harm in expecting first-time users to intuitively click "examples". (In fact I think this is better than needing to explain ../
, and it also avoids inconvenience on subsequent logons.)
I'm in favour of having the notebooks loaded into an ~/examples
folder.
I think that having a "first start" README in the root/home folder is a good idea. And we already have some logic for a don't sync
flag too (undocumented...).
I think that we must force overwrite the folder and shouldn't copy the .git
. The examples folder is for new users, and it should always be clean and work. For folks doing dev, they should self-manage their own space <somewhere else>
, for example, I just have a ~/dev/whavever-project
folder on the sandboxes where I do actual dev.
I wasn't involved in the decision to pull out the examples into folders in the root of the project, but I'm aware there was a decision there.
I'd also be happy with a ~/Examples
folder, but if we went down that path, we would need to make sure the user is presented with a really nice, simple and easy to follow splash/readme page (preferably including some screenshots) that loads in the JupyterLab window as the first thing they see when their server starts up (I saw a demo of this functionality during the recent ODC hackathon so it should be possible).
This readme would need to walk the user through in baby steps, even to the level of:
Examples
folder in the file browser"Examples/Beginners_guide
folder and double click on 01_Jupyter_notebooks.ipynb
to launch your first notebook"(The readme could also be the place to include the warning that files in Examples
will be overwritten automatically)
As long as this was clearly explained and shown to the user at start up, I think it could serve as a nice way to familiarise the user with using the file browsing interface and launching notebooks for the first time.
I'd be happy to work on the readme if we settled on this as an approach.
I agree with a lot of the points made here, and particularly agree with Robbi's point that the user needs to have some clear guidance around how to use the sandbox and the ~/Examples
folder. This could potentially even include a description of how the ~/Examples
folder works, and a recommendation that they save copies of example notebooks back to somewhere in their home directory if they want to work on them.
As an additional thought, I've been using Amazon SageMaker recently, and they use a Jupyter Lab extension to manage their example notebooks. I think there are some upsides and downsides to this approach, but would be happy to discuss further with anyone that's interested. You can see a bit of a preview of how it works here: https://docs.aws.amazon.com/sagemaker/latest/dg/howitworks-nbexamples.html. One of the main benefits is that the examples are read-only, with a pop-up asking the user if they want to copy the file to their directory.
I think a README is an excellent idea. I don't have a strong feeling as to whether we move the sandbox examples to the examples folder or leave them as the top dir - but either way I think we need a README so that users have a good idea of what's what.
We might be able to create read-only notebooks without using SageMaker - https://coding-stream-of-consciousness.com/2018/11/12/read-only-protected-jupyter-notebooks/ - though it looks like it'll take a bit more effort. SageMaker looks interesting - do we know how the pricing compares to notebooks as they are on the sandbox?
Hey @BexDunn -- sorry, my SageMaker link might have been a bit misleading. There's no need to use SageMaker specifically, it's just an example implementation of a Jupyter Lab extension that can handle a collection of example notebooks. The actual extension is here: https://github.com/danielballan/nbexamples
Would another option be simply to symlink ~/examples
to some shared directory (external to /home
), where users do not even have sufficient privileges to dirty or mismanage that copy of the repo?
That would also be a place to administer README
content that is specific to that sandbox infrastructure, without committing it to the general-purpose notebooks repo. It could remove any need for ongoing management/syncing of user home directories (and associated potential mess/confusion).
I don't think we want to stop people from being able to write to it, @benjimin. It makes the notebooks do weird things if they're read only, I think.
I like @caitlinadams' suggestion of using the nbexamples
process. Either that or just sticking with the current process, but doing the sync into an examples
folder.
Also, a potential security motivation is that any files which should not be committed to any git history still get stored somewhere inside the working directory of a repo (i.e. training users to invite mistakes in sensitive data management).
Current Process:
On each server startup we clone the git repo on each to a temporary folder, remove some files (including .git) and copy the contents to the home directory of the notebook user with rsync.
This way of working ensures that there are no git merge conflicts causing issues with the user files
However it has some downsides:
Alternatives