Jupyter with Conda Testing

hgputnam commented 2 years ago

Setup (so far)

Installed latest anaconda to /software/c4/ondemand/software/jupyter_conda
Allowed conda to conda-ize the .bashrc for the test user.
Copied the conda specific stuff from .bashrc to .condarc in the above directory.
Altered script.sh to source the .condarc instead of the old venv activate https://github.com/UCSF-CBI/c4-ondemand-interactive-apps/blob/main/jupyter_conda/template/script.sh.erb
For test user david - installed conda and a second conda environment called new-env in the $HOME.
Deployed jupyter_conda app to c4-ondemand2

hgputnam commented 2 years ago

Test

As user david register the "kerne" from new-env ipython kernel install --user --name=new-env
Now david logs into c4-ondemand2 and select the Jupyter - Conda app.
Opening a new notebook, David can select the new-env kernel. One time step, there is a 'Trusted' button that the user must press so that jupyter trusts the new-env kernel.
Now david can select between the two kernels.

Source - https://towardsdatascience.com/get-your-conda-environment-to-show-in-jupyter-notebooks-the-easy-way-17010b76e874

HenrikBengtsson commented 2 years ago

I'm not sure what you're aiming at here, but I think we want to avoid yet-another stack to maintain and document. I'm equally against conda solutions as container solutions, unless the user is savvy enough to maintain them themself.

hgputnam commented 2 years ago

Oh, this is a science experiment. No containers involved. I was planning to pile in whatever I learn in the hope that this can be turned into a module.

hgputnam commented 2 years ago

So - it uses a conda install to a shared location (just as the previous jupyter did). It allows for users to register "kernels" from conda environments in their $HOME and use them from the OnDemand Jupyter.

HenrikBengtsson commented 2 years ago

Oh, this is a science experiment. No containers involved. I was planning to pile in whatever I learn in the hope that this can be turned into a module.

Mmokay. But remember, there's nothing specific to OnDemand, so if we start providing special solutions for Conda users, we need to do the same when they run via terminal, and we don't want to go there. We simply don't have the resources to support that too.

HenrikBengtsson commented 2 years ago

Maybe you'll convince be otherwise later on.

hgputnam commented 2 years ago

Let's discuss that at our next HPC. It is a pretty important decision. I don't disagree with anything you are saying here but, the fact is, a lot our people do use Conda whether we like it or not. I am going to make an effort to learn enough LUA to convert things like this into a module so you do not have to do 100% of that work. It would be great to agree to a set of rules so that whatever I do, it will look, act, and feel like a CBI module. We can start with rule 1 - it has to work in a shell as well as OnDemand.

hgputnam commented 2 years ago

@HenrikBengtsson - so I think that what I have done above will solve the Jupyter issue for Conda users wishing to change kernels problems. Are you saying I should not let them try this until we come up with a different approach? This is very similar to the approach I took for the first iteration of Jupyter. Local software with a source command to set up paths (done from the Jupyter run script: https://github.com/UCSF-CBI/c4-ondemand-interactive-apps/blob/main/jupyter_conda/template/script.sh.erb

HenrikBengtsson commented 2 years ago

TL;DR: Don't rush it.

I can't make the decision. I think the use of Conda on HPC (and Wynton) is a big black hole that needs to be understood much better before making any calls. My concern is not how easy it'll be. My concern how much it can mess everything up for the user when we mix and match system libraries on the LD_LIBRARY_PATH. For example, I imagine a user does:

module load CBI r
conda activate
install R package 'foo' that compiles some C code, which links toward a library in the conda stack instead to the system
Happily using 'foo' in R

Now, if the user does not activate their conda stack:

module load CBI r
Using 'foo' in R fails with weird and hard to troubleshoot errors.

Also, since the conda stack is a personalize stack under constant change, if the user's conda stack is updated, the 'foo' package in R might all of a sudden break, but the user might not notice until months later.

In those cases, you'll end up with a support email and having to spend lots of time troubleshooting a stack that is unique to one user. Fixing the problem for that user will not help the next who might run into a similar problem.

Note that, in the above case, it's not enough to uninstall/remove Conda from ~/.bashrc. The damage has already been made to the 'foo' R package and likely many others too. So, telling the people to wipe their conda installation might make things even worse. This is how you end up with users doing lots of trial and error installations and uninstallations until they've got something that looks like it works. At least for the moment.

I'm pretty sure the above is the reasons why RedHat is so conservative and lagging behind. They cannot just upgrade system libraries when they want to. If they're not careful, things will break in unknown places not tested. This is why I try to be as conservative as possible with the CBI stack too. I try to minimize any quick fixes, because they will come back and haunt you when you have hundreds of users.

So, even if users do use Conda, which is often because online instructions are so simple to follow, it doesn't mean they should if it shoots themselves in the foot. And the bullet might hit them months later, so it's hard to trace back what triggered it. By providing an official "solution" for Conda users, you're in the business in supporting Conda, but to do that, you need to understand it very well and provide best-practice documentations for how to use it without making a mess. I don't have the skills to do that.

hgputnam commented 2 years ago

Should I pull Jupyter altogether?

HenrikBengtsson commented 2 years ago

Should I pull Jupyter altogether?

The current one is installed from pip, correct? If so, that's the official way, which means it compiles toward a (fixed) pip stack and mostly (only?) system libraries. AFAIU, there should be nothing in flux.

While at it, never upgrade that installation, at least not after going live. Instead, add a version to it and keep it as-is. If there's a new version to Jupyter, that should be installed the same way in a different folder.

hgputnam commented 2 years ago

The current one is installed from pip ... in a virtual environment. That is a directory under /software and there is a script called 'activate'. So the jupyter script calls the activate file with source. The activate script sets a non-system path and the user prompt. I made a file in my $HOME like so env > env.out then I sourced the activate file and made another file: env > env_after_activate.out I then did a diff to see what changed:

(jupyter_py38) [hputnam@c4-dev3 ~]$ diff env_after_activate.out env.out
13d12
< OLDPWD=/software/c4/ondemand/software/jupyter_py38/bin
24d22
< VIRTUAL_ENV=/software/c4/ondemand/software/jupyter_py38
27c25
< PATH=/software/c4/ondemand/software/jupyter_py38/bin:/software:/software/bin:/software/ruby-2.6.0/bin:/software/RSEM-1.3.1/bin:/software/python/bin:/software/gatk:/software/bowtie2:/software/Bismark:/opt/sge/bin/lx-amd64/:/home/ahechmer/ht-pipes/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/c4/home/hputnam/.local/bin:/c4/home/hputnam/bin
---
> PATH=/software:/software/bin:/software/ruby-2.6.0/bin:/software/RSEM-1.3.1/bin:/software/python/bin:/software/gatk:/software/bowtie2:/software/Bismark:/opt/sge/bin/lx-amd64/:/home/ahechmer/ht-pipes/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/c4/home/hputnam/.local/bin:/c4/home/hputnam/bin
32d29
< PS1=(jupyter_py38) [\u@\h \W]\$ 
52a50
> OLDPWD=/software/c4/ondemand/software/jupyter_py38/bin

From what I can see PATH is the only system variable affected. It puts the venv bin at the front of my $PATH variable. It uses non-system variable VIRTUAL_END and changes the prompt. Seems pretty safe so far.

Now the new Jupyter was done almost the same way. Instead of a virtual env, I did a conda install against the /software folder. I invented my own activate script using what conda had stuck in my .bashrc (I then quickly removed those). This does seem to be quite a lot different now that I am doing more thorough testing. So, the question is do we stay with the original version for now?

hgputnam commented 2 years ago

After all this - we don't need the conda version, at least for the problem I was trying to solve. The notebook launched from the prior version can see those conda kernels so long as the user has 'registered' them and have a required package in the conda env.

UCSF-CBI / c4-ondemand-interactive-apps

Jupyter with Conda Testing #14

Setup (so far)

Test