det-lab / jupyterhub-deploy-kubernetes-jetstream

CDMS JupyterHub deployment on XSEDE Jetstream
0 stars 1 forks source link

CDMS Python environment #12

Closed zonca closed 4 years ago

zonca commented 4 years ago

A word of warning - it seems that the very first imports that pull in CDMS python packages are failing. This is probably because the CVMFS environment doesn't install those. @bloer is the authority on this, though.

Originally posted by @pibion in https://github.com/det-lab/jupyterhub-deploy-kubernetes-jetstream/issues/8#issuecomment-606782144

I would like some details about the Python packages for CDMS analysis.

pibion commented 4 years ago
zonca commented 4 years ago

possibly having a conda environment?

zonca commented 4 years ago

ideally if we have a full conda environment which also has the jupyterhub and jupyterlab packages, we wouldn't even need a Python enviroment on the docker container for Jetstream, we could completely run off CVMFS.

pibion commented 4 years ago

That has some appeal, especially if it could make jupyterhub deployment easier for sites that already have CVMFS installed.

In terms of the analysis tools, @bloer is working on putting the "python-only" part of them into CVMFS. I'll let him comment on whether or not he's using a conda environment for that.

I have to admit I've never made a conda package before, but I'm a huge fan, it's the only thing I use for python.

bloer commented 4 years ago

The python environment will be available in the next CVMFS release. lots of python packages are already distributed by the central LCG CVMFS repo, including most of the scientific packages that conda provides, so I think conda is an unnecessary step if we're continuing to base the image on CVMFS.

For things not provided centrally, or for users to get the latest packages faster than new CVMFS releases are published, I think we can provide instructions for users to use pip with the --user flag

zonca commented 4 years ago

ok, thanks @bloer so once we get the new release we will resume testing.

if it includes jupyterhub and jupyterlab, better, otherwise I will find a workaround.

Please update this issue when the new release is available.

ziqinghong commented 4 years ago

Not sure if this is the right place for this... I'm using pandas, and get errors like

ImportError: Missing optional dependency 'tables'. Use pip or conda to install tables.

with pandas.read_hdf... @zonca How do we get an enhanced version of pandas?

bloer commented 4 years ago

@zonca to give some context, there is now a CVMFS release V02-01-04 that contains some of the python modules. @ziqinghong is one of the lucky folks testing to see how broken it is.

@ziqinghong in the short term, the easiest thing to do is install a package with user mode: python -m pip install --user tables

Medium-term, since this isn't an XSEDE-specific issue, please add issues encountered while testing this release to the JIRA bug tracker https://jira.slac.stanford.edu/browse/CDMSGREL-32

Long-term I worry about the different paces of release cycles and analysis code development. We'll probably have to provide some tools to make it easy for everyone to work with bleeding edge CDMS packages or 3rd party packages that aren't provided with the release

ziqinghong commented 4 years ago

Thanks Ben. Will report missing package in jira.

The slac jupyterlab environment has been stable and plenty good for more than 6 months. There might be packages updates that are transparent to me, but Amy and her team did beat it into shape pretty rapidly, and once that happened, there are very rare cases that we requested an upgrade.

zonca commented 4 years ago

@bloer how do we access the python modules on CVMFS?

zonca commented 4 years ago

@ziqinghong what python environment are you using?

ziqinghong commented 4 years ago

Errr. I just spawned a jupyter instance and assumed you guys set up the environment well.... How do I check?

zonca commented 4 years ago

the jupyter environement is not currently using the python environment from CVMFS, so please do not report the issue on JIRA

zonca commented 4 years ago

once I get pointers from @bloer on how to set it up, I'll modify the Jupyter environment to use that. so we can get all the python modules from CVMFS also on Jupyter immediately after release.

bloer commented 4 years ago

@zonca Hmm OK maybe I'm confusing multiple issues. I thought you were already loading the CVMFS environment. Just do source /cvmfs/cdms.opensciencegrid.org/setup_cdms.sh V02-01-04

zonca commented 4 years ago

@zonca Hmm OK maybe I'm confusing multiple issues. I thought you were already loading the CVMFS environment. Just do source /cvmfs/cdms.opensciencegrid.org/setup_cdms.sh V02-01-04

I need details about the python environment so I can make it compatible with the rest, do you have any documentation? what version of python is it using? is it a full enviroment or just a modules folder?

bloer commented 4 years ago

The only documentation I kind find is this extremely useless page: http://lcginfo.cern.ch/pkg/Python/ Right now it's using python3.6. I hope to move to 3.7 in a near future release.
The environment's pretty messy. There is a full installation (site.getsitepackages() yields /cvmfs/sft.cern.ch/lcg/releases/Python/3.6.5-f74f0/x86_64-slc6-gcc8-opt/lib/python3.6/site-packages), and also a handful of paths added to PYTHONPATH by CVMFS. I don't know if the CVMFS environment would function if PYTHONPATH were cleared.

The CDMS-specific packages are provided via PYTHONPATH as well, but of course we have more control over that and could change it if necessary (maybe use .pth files instead?)

zonca commented 4 years ago

ok, @bloer, I am testing it, it looks like there is an issue with scdmsPyTools.BatTools.IO:

IPython 7.5.0 -- An enhanced Interactive Python. Type '?' for help.

In [1]: import sys                                                                                         

In [2]: sys.path                                                                                           
Out[2]: 
['/cvmfs/sft.cern.ch/lcg/releases/ipython/7.5.0-c6a48/x86_64-centos7-gcc8-opt/bin',
 '',
 '/cvmfs/cdms.opensciencegrid.org/releases/centos7/V02-01-04/lib/python3.6/site-packages',
 '/cvmfs/sft.cern.ch/lcg/releases/ROOT/6.18.00-885ca/x86_64-centos7-gcc8-opt/lib',
 '/cvmfs/sft.cern.ch/lcg/views/LCG_96python3/x86_64-centos7-gcc8-opt/lib',
 '/cvmfs/sft.cern.ch/lcg/views/LCG_96python3/x86_64-centos7-gcc8-opt/lib/python3.6/site-packages',
 '/cvmfs/sft.cern.ch/lcg/releases/Python/3.6.5-f74f0/x86_64-centos7-gcc8-opt/lib/python36.zip',
 '/cvmfs/sft.cern.ch/lcg/releases/Python/3.6.5-f74f0/x86_64-centos7-gcc8-opt/lib/python3.6',
 '/cvmfs/sft.cern.ch/lcg/releases/Python/3.6.5-f74f0/x86_64-centos7-gcc8-opt/lib/python3.6/lib-dynload',
 '/cvmfs/sft.cern.ch/lcg/releases/Python/3.6.5-f74f0/x86_64-centos7-gcc8-opt/lib/python3.6/site-packages',
 '/cvmfs/sft.cern.ch/lcg/views/LCG_96python3/x86_64-centos7-gcc8-opt/lib/python3.6/site-packages/IPython/extensions',
 '/home/jovyan/.ipython']

In [3]: from scdmsPyTools.BatTools.IO import * 
   ...:                                                                                                    
---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
<ipython-input-3-39d806e33aaa> in <module>
----> 1 from scdmsPyTools.BatTools.IO import *

/cvmfs/cdms.opensciencegrid.org/releases/centos7/V02-01-04/lib/python3.6/site-packages/scdmsPyTools/BatTools/__init__.py in <module>
      1 import os,sys
      2 sys.path.append(os.path.dirname(os.path.realpath(__file__)))
----> 3 from rawdata_reader import *
      4 from rawdata_writer import *

ModuleNotFoundError: No module named 'rawdata_reader'
bloer commented 4 years ago

yeah, I should have some release notes. BatTools isn't included because there's a problem with missing boost libraries that I haven't figured out how to solve yet. BatTools is also being deprecated in the near future to separate out that functionality from the rest of scdmsPyTools

ziqinghong commented 4 years ago

from scdmsPyTools.TES.Templates import * is a good test for what we're using in that package.

zonca commented 4 years ago

ok, when I try to run Jupyterlab or the QT console with this environment the kernel keeps dying. I am debugging this issue.

@ziqinghong in the meantime, I think you can test opening a terminal in the JupyterHub environment, then loading the CVMFS environment:

source /cvmfs/cdms.opensciencegrid.org/setup_cdms.sh V02-01-04

and use the console version of ipython.

bash-4.2$ which ipython
/cvmfs/sft.cern.ch/lcg/views/LCG_96python3/x86_64-centos7-gcc8-opt/bin/ipython
bash-4.2$ ipython

In [1]: from scdmsPyTools.TES.Templates import * works fine there.

zonca commented 4 years ago

Currently I can install a kernel from CVMFS doing (to be automated for all users later on):

source /cvmfs/cdms.opensciencegrid.org/setup_cdms.sh V02-01-04
python -m ipykernel install --user --name cdms_V02-01-04 --display-name "CDMS V02-01-04"

So I have a kernel for CDMS available:

image

However, this kernel doesn't work.

@bloer I got the error out of JupyterHub:

{"log":"/cvmfs/sft.cern.ch/lcg/views/LCG_96python3/x86_64-centos7-gcc8-opt/bin/python: No module named ipykernel_launcher\n","stream"

it looks like ipykernel is broken.

ziqinghong commented 4 years ago

I was able to use the same environment to launch a jupyter-lab at centos7.slac.stanford.edu, and get a notebook running. Screen Shot 2020-04-07 at 8 52 05 PM

zonca commented 4 years ago

@ziqinghong can you connect to a notebook and execute code?

ziqinghong commented 4 years ago

Yup, at least I could import scdmsPyTools...

bloer commented 4 years ago

I had to give jupyter lab the "--core-mode" switch. . I'll add ipykernel to the list to fix for next release. Is there any way to install that so we can see what else is broken? Pushing out new releases is not fast.

Sent from Outlook Mobilehttps://aka.ms/blhgte


From: ziqinghong notifications@github.com Sent: Tuesday, April 7, 2020 7:06:24 PM To: det-lab/jupyterhub-deploy-kubernetes-jetstream jupyterhub-deploy-kubernetes-jetstream@noreply.github.com Cc: Loer, Ben M ben.loer@pnnl.gov; Mention mention@noreply.github.com Subject: Re: [det-lab/jupyterhub-deploy-kubernetes-jetstream] CDMS Python environment (#12)

Yup, at least I could import scdmsPyTools...

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://protect2.fireeye.com/v1/url?k=71f36725-2d46589c-71f34d30-0cc47adc5fce-2a08d4956b8d8188&q=1&e=586ff6b5-91dc-403a-9d53-b3350d77e54c&u=https%3A%2F%2Fgithub.com%2Fdet-lab%2Fjupyterhub-deploy-kubernetes-jetstream%2Fissues%2F12%23issuecomment-610710908, or unsubscribehttps://protect2.fireeye.com/v1/url?k=3623a2de-6a969d67-362388cb-0cc47adc5fce-88de40519c064f6e&q=1&e=586ff6b5-91dc-403a-9d53-b3350d77e54c&u=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FABBBSZCZDBTHSBCWVRTTSJ3RLPL2BANCNFSM4LX2UFMA.

zonca commented 4 years ago

@bloer I started installing ipykernel in user space, then it requested a lot of its requirements:

ipython, ipython_genutils, traitlets, jupyter, ptyprocess, prompt_toolkit, jupyter_client, jupyter_core, pyzmq

not sure why I needed to reinstall all those packages even if some were available in the base environment. Anyway, after all this it was working fine.

It would be useful to also install virtualenv in the base system. So we can use that instead of messing with PYTHONPATH.

bloer commented 4 years ago

Since python 3.5(?) the recommended virtual environment is the built in venv. (python -m venv)

Sent from Outlook Mobilehttps://aka.ms/blhgte


From: Andrea Zonca notifications@github.com Sent: Tuesday, April 7, 2020 10:14:57 PM To: det-lab/jupyterhub-deploy-kubernetes-jetstream jupyterhub-deploy-kubernetes-jetstream@noreply.github.com Cc: Loer, Ben M ben.loer@pnnl.gov; Mention mention@noreply.github.com Subject: Re: [det-lab/jupyterhub-deploy-kubernetes-jetstream] CDMS Python environment (#12)

@bloerhttps://protect2.fireeye.com/v1/url?k=3640a937-6af597f8-36408322-0cc47adc5e60-940908249d58f091&q=1&e=9281700b-f2ca-40c5-99f4-9b976aecbbf7&u=https%3A%2F%2Fgithub.com%2Fbloer I started installing ipykernel in user space, then it requested a lot of its requirements:

ipython, ipython_genutils, traitlets, jupyter, ptyprocess, prompt_toolkit, jupyter_client, jupyter_core, pyzmq

not sure why I needed to reinstall all those packages even if some were available in the base environment. Anyway, after all this it was working fine.

It would be useful to also install virtualenv in the base system. So we can use that instead of messing with PYTHONPATH.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://protect2.fireeye.com/v1/url?k=8dc1f2ec-d174cc23-8dc1d8f9-0cc47adc5e60-0015be7764bc94c8&q=1&e=9281700b-f2ca-40c5-99f4-9b976aecbbf7&u=https%3A%2F%2Fgithub.com%2Fdet-lab%2Fjupyterhub-deploy-kubernetes-jetstream%2Fissues%2F12%23issuecomment-610755565, or unsubscribehttps://protect2.fireeye.com/v1/url?k=b9ff9648-e54aa887-b9ffbc5d-0cc47adc5e60-694439fdc0215992&q=1&e=9281700b-f2ca-40c5-99f4-9b976aecbbf7&u=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FABBBSZFHZALTCB7T4LRBXXTRLQB5DANCNFSM4LX2UFMA.

zonca commented 4 years ago

Thanks I'll try that, I generally only use conda

On Wed, Apr 8, 2020, 08:04 Ben Loer notifications@github.com wrote:

Since python 3.5(?) the recommended virtual environment is the built in venv. (python -m venv)

Sent from Outlook Mobilehttps://aka.ms/blhgte


From: Andrea Zonca notifications@github.com Sent: Tuesday, April 7, 2020 10:14:57 PM To: det-lab/jupyterhub-deploy-kubernetes-jetstream < jupyterhub-deploy-kubernetes-jetstream@noreply.github.com> Cc: Loer, Ben M ben.loer@pnnl.gov; Mention mention@noreply.github.com Subject: Re: [det-lab/jupyterhub-deploy-kubernetes-jetstream] CDMS Python environment (#12)

@bloer< https://protect2.fireeye.com/v1/url?k=3640a937-6af597f8-36408322-0cc47adc5e60-940908249d58f091&q=1&e=9281700b-f2ca-40c5-99f4-9b976aecbbf7&u=https%3A%2F%2Fgithub.com%2Fbloer> I started installing ipykernel in user space, then it requested a lot of its requirements:

ipython, ipython_genutils, traitlets, jupyter, ptyprocess, prompt_toolkit, jupyter_client, jupyter_core, pyzmq

not sure why I needed to reinstall all those packages even if some were available in the base environment. Anyway, after all this it was working fine.

It would be useful to also install virtualenv in the base system. So we can use that instead of messing with PYTHONPATH.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub< https://protect2.fireeye.com/v1/url?k=8dc1f2ec-d174cc23-8dc1d8f9-0cc47adc5e60-0015be7764bc94c8&q=1&e=9281700b-f2ca-40c5-99f4-9b976aecbbf7&u=https%3A%2F%2Fgithub.com%2Fdet-lab%2Fjupyterhub-deploy-kubernetes-jetstream%2Fissues%2F12%23issuecomment-610755565>, or unsubscribe< https://protect2.fireeye.com/v1/url?k=b9ff9648-e54aa887-b9ffbc5d-0cc47adc5e60-694439fdc0215992&q=1&e=9281700b-f2ca-40c5-99f4-9b976aecbbf7&u=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FABBBSZFHZALTCB7T4LRBXXTRLQB5DANCNFSM4LX2UFMA

.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/det-lab/jupyterhub-deploy-kubernetes-jetstream/issues/12#issuecomment-611011452, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAC5Q4VTH6YOHI74DBXGJ3LRLSG37ANCNFSM4LX2UFMA .

bloer commented 4 years ago

Also beware: I at least only learned recently that virtual environments don't play nicely with PYTHONPATH. When PYTHONPATH is set it gets added to sys.path before the virtual environment, so you have to go to extra lengths to install newer packages in the venv. PYTHONPATH also preempts user site paths.

zonca commented 4 years ago

@bloer same with venv, I needed to install:

pip install --upgrade --ignore-installed ipykernel ipython traitlets jupyter_client jupyter six ipython_genutils ptyprocess pyzmq prompt_toolkit

to get the kernel working, and after that still a lot of issues with PYTHONPATH, it was easier just using the pip install --user.

Anyway, I think for the next release it would be useful if you can test that you can register a kernel:

source /cvmfs/cdms.opensciencegrid.org/setup_cdms.sh V02-01-04
python -m ipykernel install --user --name cdms_V02-01-04 --display-name "CDMS V02-01-04"

and then use it in Jupyterlab

pibion commented 4 years ago

@zonca, I believe @bloer has registered the kernel in the new release. The new command (which I have not yet tested on XSEDE) is

/cvmfs/cdms.opensciencegrid.org/setup_cdms.sh -K V02-03-01 --user

We can test success with

import cdms
cdms.get_global_version()
zonca commented 4 years ago

ok, I tested and it works. Can @pibion or @bloer please explain what -K and --user do?

Next I'll try to deploy this on the notebook environment.

bloer commented 4 years ago

There's a bit of description with /cvmfs/cdms.opensciencegrid.org/setup_cdms.sh -h. The -K switch tells the script to install a kernel. Any arguments following the version number are passed to jupyter kernelspec install, so you could install to some other central location.

zonca commented 4 years ago

great job @bloer! The new release works great.

Exactly as @pibion suggested, open a terminal on the JupyterHub deployment, type:

/cvmfs/cdms.opensciencegrid.org/setup_cdms.sh -K V02-03-01 --user

then JupyterHub works with the kernel out of CVMFS (run change kernel from the menu), see:

image

@pibion @ziqinghong I think you can start to play with the environment and check more deeply if anything is broken.

In the meantime, I'll think what is the best way to automate this so that new users do not have to install the kernels.

zonca commented 4 years ago

see https://github.com/zonca/docker-jupyter-cdms-light/pull/1 I wasn't able to run it automatically, but created a script that installs all the kernels automatically. so users can just open a terminal and run install_cdms_kernels

zonca commented 4 years ago

ok added docs about this: https://github.com/det-lab/jupyterhub-deploy-kubernetes-jetstream/commit/4ac37b148fdf231e5bebf1b3b072f2ce716f55c4

zonca commented 4 years ago

ok, the Python environment works fine. Please open a new issue if anything stops working.