Closed betolink closed 3 years ago
Looks like this works in user space so no sudo
required.
conda activate {environment}
ipython kernel install --name "{environment}"
However this doesn't seem like a persistent operation, when my instance was restarted the kernel was not there.
I think by adding this section to the Dockerfile, conda
will default to creating environments in the home folder that is persistent.
# Configure conda/mamba to create new environments within the home folder by
# default. This allows the environments to remain in between restarts of the
# container if only the home folder is persisted.
RUN conda config --system --prepend envs_dirs '~/.conda/envs'
Example from the hub.jupytearth.org's Dockerfile for the user environment image.
@2i2c-org/2i2c-team is this perhaps sensible to add for the pilot hubs default Dockerfile?
@2i2c-org/2i2c-team is this perhaps sensible to add for the pilot hubs default Dockerfile?
IMHO, that should be something users should decide upon. There are several reproducibility/replicability workflows where starting from a fresh environment that is "codified" in some image/dockerfile somehow helps that others can do the same as you did... In fact, I would be surprised to see some environments persisted by default after restarting my pod :wink:
I think some Pangeo deployments let you pick the user base image? (with Openscapes we only pick the EC2 instance type). Maybe something like that would be useful. A hub administrator could add different repos for different environments.
At the moment OpenScapes is mainly working on https://github.com/NASA-Openscapes/earthdata-cloud-cookbook which requires some initial prototyping. Environment persistence between restarts would be handy to have until we are in "production mode"
Your Hub administrator should be able to set up a customized environment: https://pilot.2i2c.org/en/latest/admin/howto/environment.html. If the environment persistence is useful/needed in your use case, a custom Dockerfile adding the lines @consideRatio suggested should be enough to support it, IMHO. Btw, we are in fact testing some new tooling to allow admins to self-serve the creation of the environment they are going to put in front of the users so they do not need to build it by themselves, just configure it.
I guess I need to find out who is our hub admin and see if we can get the Dockerfile approach + persistence. One thing I noticed from the documentation is that you discourage the use of quay.io/my-user/my-image:latest
and for prototyping I was precisely thinking about having something like that (so if I modify the environment a hub admin doesn't have to update the build tag).
One thing I noticed from the documentation is that you discourage the use of quay.io/my-user/my-image:latest
Yes, having specific references (tags) is important to really know the environment are you working with.
and for prototyping I was precisely thinking about having something like that (so if I modify the environment a hub admin doesn't have to update the build tag).
As I said before, we are currently testing some new tooling to prototype/test and eventually self-serve the environment customization. Currently, the process looks like this: https://github.com/2i2c-org/peddie-image
Would you be interested to have something like this for openscapes?
Just read this tooling and looks like step 4 is what I wanted to avoid, since it requires a hub admin.
Open the Configurator for the peddie hub (you need to be logged in as an admin).
The important part would be to have an agile way of altering the environment while we are prototyping. I think just persisting my home directory as @consideRatio suggested would be enough for now.
@betolink, FYI, we are discussing the pros vs cons of shipping this by default. In the meantime, I encourage you to ping your hub admin so they can customize the image with the snippet @consideRatio shared above. In that way, we decouple the current technical discussion about this change from the customization you may need (that could be done by your hub admin without us being a blocker for your use case).
@damianavila, I just got admin credentials this morning and went to the "configurator" page. I see a box to enter a docker image name for the users and the default interface (RStudio, Lab or classic notebooks) but I don't see what image the users are running now. I don't want to disrupt what other users are doing by just entering my customized image. Is there a way to find out what image users are running now? so at least I can clone those dependencies and add the edit to persist the environment.
Hi @betolink - you can see the image reference in these lines of the config file
Thanks @sgibson91! is 783616723547.dkr.ecr.us-west-2.amazonaws.com/user-image
coming from https://github.com/2i2c-org/openscapes-image/? Oh I have so many questions and I don't want to spam you all.
I guess I could open an issue on the openscapes image repository to add what @consideRatio suggested. I assume there is a good reason why the image is being pushed to AWS ECR instead of Dockerhub.
Thanks @sgibson91! is
783616723547.dkr.ecr.us-west-2.amazonaws.com/user-image
coming from https://github.com/2i2c-org/openscapes-image/? Oh I have so many questions and I don't want to spam you all.
Yes is does look like that repository is the source of the image.
I assume there is a good reason why the image is being pushed to AWS ECR instead of Dockerhub.
I am not sure actually. Our default image repository is quay.io as that doesn't have the same rate limiting issues as DockerHub has.
One last thing (perhaps) I noticed a substantial performance hit when I installed a conda environment on my home directory. My guess is that this may be related to the home directory being mounted on EFS?
How to reproduce?
mamba env create -f environment.yml
vs
mamba env create -f environment.yml -p /home/jovyan/{environment}
Hey all - just wanted to boost this comment as well, which might be an interesting option for managing different conda environments from within Jupyter: https://github.com/2i2c-org/pilot-hubs/issues/562#issuecomment-891740990
nb-conda-kernels
sounds like a good option. We would still need some form of persistence right? otherwise we'll have to install an environment every time we start our instance. I wonder, is there a way for Jupyter hubs to configure base images per user and not hub-wide? a bit like binder + user space persistence?
My guess is that this may be related to the home directory being mounted on EFS?
Most likely that is the case, EFS is slow for this kind of conda things. So you have persistence at the cost of performance...
I wonder, is there a way for Jupyter hubs to configure base images per user and not hub-wide?
Not a per-user option, but maybe using different profiles pointing to different images that you, as a specialized user, can customize?
I imagine your use-case as-is:
One X profile in addition to the base one.
That X profile loads a docker image that is actually creating the environments you may need in the Dockerfile (and maybe installing nb_conda_kernels to manage them). In addition, that Dockerfile could contain all the customizations that Erik proposed so your conda envs are saved in /home (and persisted).
The user who wants that experience would select that X profile and they will have all the environments predefined in the Dockerfile + all the new ones that are created "live" by the user and persisted at /home.
If the user modifies one of the environments "coming" from the Dockerfile, they can "promote" the customization by just modifying the Dockerfile, pushing it, and using the Configurator to update the reference (you could even think about using a latest
reference and the Configurator step would be not needed, although it is not recommended to use latest
unless you have a real good reason for that 😜 ).
If the user works with one of their /home
-backed environment, that would be automatically persistent (at the EFS slowness cost) but that one could be "promoted" to the Dockerfile when the user is enough happy about it...
I think this is another use case where bringing the JupyterHub and BinderHub helm charts closer together will provide a solution, as we will be able to provide workflows closer to what the persistent BinderHub helm chart does https://github.com/gesiscss/persistent_binderhub i.e. a user can create an environment on the fly from a repo using repo2docker and these environments are persisted
I think having something like you both described would simplify many workflows. A Hub admin would be responsible for infrastructure, (i.e. credentials, shared mounts, instance types). Researchers will build their environment from a github repo(using repo2docker or similar.) and select the instance type they want to run this environment on. I think just having the flexibility to bootstrap an environment like Binder will reduce the need for persisting changes to the base image, since we can make those changes in the original repository and presumably persistence will be used for just work in progress or sample data but not whole Conda environments.
Hi 2i2c team, thanks for all the discussion here and in https://github.com/2i2c-org/pilot-hubs/issues/562. Does this sound like something 2i2c can support? @betolink and @amfriesz can start coordinating/preparing stuff on our end but we wanted to first confirm if this is something you'll be moving forward with, and if you know a rough timeline. @choldgraf I'm happy to chat about it too if you'd like
I think this issue can be closed. We ended up managing it at the custom base image level, another option for future deployments would be for the configuration to allow multiple user images (Jupyterhub profiles)
Is there a way to have multiple kernels in my session if I'm not a hub administrator? I tried to install some packages from the terminal and it seems I have no
sudo
either, is there a particular reason why?