danielballan / jupyter-rees-service

Jupyter service for composing and building Reproducible Execution Environment Specifications
2 stars 0 forks source link

Review README #1

Open danielballan opened 5 years ago

danielballan commented 5 years ago

Hi @RobertRosca @TeddyRendahl. Here's my first attempt at writing down the plan for this work. Any thoughts on how to expand on it or clarify it?

I think that Components 2, 3, and 4 could be implemented as one tornado app that runs as an authenticated Hub service, just as one can do with nbviewer.

For the UI, my first thought is Bootstrap since that's what JupyterHub itself already uses.

teddyrendahl commented 5 years ago

Nicely summarized. Do you think 1 is worth building out? (maybe for me to actually learn how to write an extension!), but besides that seems simple enough we might want to focus on getting right at 2?

I think we might want to add something in the README explaining the choice of containers. If I were to naively stumble into this repo, I'd probably not want to mess with the containers opting for the simpler CONDA environment approach. However, as discussed at the workshop you'll quickly run into issues.

Secondly, looking at the REES, I don't see a standard way to include data files? I understand you don't want to load the container with a ton of files, but if you say have a notebook you want to share this way that has some reduced HDF5 files that are referenced it might be nice to support those as well?

danielballan commented 5 years ago

I agree that (1) should be a straightforward lab extension, and it can wait until we have something working (2).

I think we might want to add something in the README explaining the choice of containers. I

Agreed. I started drafting something about "Why containers?" and couldn't quickly figure out how to approach it. You might bring a useful perspective here. Incidentally, I think this service could also work with conda environments or even venv environments as long as the user restricted their REES to environment.yml or requirements.txt respectively (no apt.txt or Dockerfile, etc). Making that possible might help bring people who are bought into containers. The important thing is that the environment-or-container encompasses the notebook server and front-end code; it's not just a kernel.

Secondly, looking at the REES, I don't see a standard way to include data files? I

Any files located in the repo---notebooks, HDF5 files, ...---will be bundled into the container. Distributing data files that way is the most direct approach. But it does require re-building the container if the files need to be changed, which can feel like overkill, so it is also useful to consider pulling the files at runtime using nbgitpuller. It's not immediately clear to me how that fits in, but one could imagine a relatively small number of images that each contain the common requirements for a given community and then using nbgitpuller to pull in specific content.

danielballan commented 5 years ago

Thinking more about exactly where containers fit into this, I pushed some of the thoughts above into the README in 61a1a73d0c2617447b8827ff2655c200397cae82. More revisions welcome! :-D

teddyrendahl commented 5 years ago

think this service could also work with conda environments or even venv environments as long as the user restricted their REES to environment.yml or requirements.txt

I thought this too until you want to switch architectures.

Also, as far as the data goes I'm starting to appreciate the complexity. It does seem like if the main motivation is to have repeatable scientific notebooks the same environment won't do much with out repeatable access to data but the solution might be too much to include in this tool.