LSSTDESC / desc-help

DESC Computing Requests
BSD 3-Clause "New" or "Revised" License
2 stars 0 forks source link

[NERSC] Generic way to DESC environment with additional packages #7

Closed slosar closed 4 years ago

slosar commented 4 years ago

Description A student Sam Goldstein is trying to use nbodykit with DESC environment. He can either see nbodykit, following https://nbodykit.readthedocs.io/en/latest/getting-started/install.html#nbodykit-on-nersc OR use DESC environment with GCRCatalogs, etc but not both.

What is the generic way to solve this? I assume some version of clone desc environment, add nbodykit? I do not think this package if of sufficient general interest to warrant inclusion in the DESC env.

A clear and concise description of what the issue is.

Choose all applicable topics by placing an 'X' between the [ ]:

johannct commented 4 years ago

what is the problem with just pip install --user the soft?

heather999 commented 4 years ago

Just want to be sure I understand what environment is being used - I'm assuming it is desc-python? I'd like to discourage over-use of pip install --user because it then becomes almost impossible to fix future problems, since any incompatibilities with that user installed package will be hidden in their $HOME area.
I'm in the process of migrating desc-python to utilize a docker/shifter image - which will be invisible to users - but would provide some alternatives to adding additional packages.

But for today..I would generally recommend installing local user packages in your own area, if using pip, that would look like:

pip install --prefix <PathToMyDirectory>

and then when using this environment and you want access to your user installed packages:

export PYTHONPATH=<PathToMyDirectory>:$PYTHONPATH
export DESCPYTHONPATH=<PathToMyDirectory>

However looking more deeply at this particular package: nbodykit, the documentation notes some special care required to use this on the NERSC compute nodes. For now, I'm going to assume we're only interested in the NERSC login nodes and jupyter. mpi4py is one of the dependencies, and that requires some different installation: https://docs.nersc.gov/programming/high-level-environments/python/mpi4py/#mpi4py-in-your-custom-conda-environment Though, the nbodykit doc would seem to indicate this may not be necessary if we're not submitting jobs to batch. So in this particular case, I would clone the environment and add nbodykit and its dependencies to that new user owned environment, but currently those attempts have ended in errors.

I will try to play with this some more - but may not get back to it today. Likely I'll push ahead with this docker based desc-python and see if that allows me to work around the problems more easily.

yymao commented 4 years ago

@slosar Is GCRCatalogs the only package that you need from DESC environment?

If so, I'd suggest a hack for now: just use nbodykit environment and then adding LSSTDESC/gcr-catalogs clone to sys.path at runtime (instructions here).

If you need other DESC packages then this hack is not particularly useful.

cwwalter commented 4 years ago

I think moving towards a setup where people make a custom environment and install special pacakge into that (conda comes with a pip that will do this if it isn't in conda forge) is probably a good approach to start thinking about. Then you could supply a spec file that gives you the base desc environment to start with like this:

conda install --name myenv --file spec-file.txt

as described in

https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html#building-identical-conda-environments

cwwalter commented 4 years ago

BTW: I think this may actually be smart enough to use links for the shared packages in the base if it is set up correctly. That certainly works on my computer but I have never tried to do it in a multiuser setup. But, I think in principle it works.

https://docs.conda.io/projects/conda/en/latest/user-guide/configuration/admin-multi-user-install.html

cwwalter commented 4 years ago

So when you make new envois they go in .conda. I use this with the cfvmfs distribution on my computers.

wasabi:~ % conda env list
# conda environments:
#
3x2pt                    /Users/walter/.conda/envs/3x2pt
CCL                      /Users/walter/.conda/envs/CCL
CCL2                     /Users/walter/.conda/envs/CCL2
CERN                     /Users/walter/.conda/envs/CERN
R                        /Users/walter/.conda/envs/R
awkward                  /Users/walter/.conda/envs/awkward
pymc3                    /Users/walter/.conda/envs/pymc3
roundtrip                /Users/walter/.conda/envs/roundtrip
skymap                   /Users/walter/.conda/envs/skymap
utils                    /Users/walter/.conda/envs/utils
base                     /cvmfs/sw.lsst.eu/darwin-x86_64/lsst_distrib/w_2019_23/python/miniconda3-4.5.12
lsst-scipipe-1172c30  *  /cvmfs/sw.lsst.eu/darwin-x86_64/lsst_distrib/w_2019_23/python/miniconda3-4.5.12/envs/lsst-scipipe-1172c30
yymao commented 4 years ago

If you make a new user environment from a base environment, and at a later time a package in the base environment get updated. Does the user environment inherit the same update automatically?

cwwalter commented 4 years ago

No reading from that file is just a way to get the same packages installed.

There is something called "stacking" but I think that only works for executables via the path (I haven't tried it yet).

yymao commented 4 years ago

Thanks. That's consistent with my understanding too. I think the reason that we have not taken this approach is because our "base environment" is not fully stable, and it still receives updates frequent enough. So if users want to receive these updates, they will have to update their user environment every time the base environment gets updates.

cwwalter commented 4 years ago

Right.. but I was thinking more of the case here where someone can't use the base environment because they have some not normal package they need.

Installing into a separate conda environment which they can throw it away or update it works pretty well. Later if it is really something useful it can be requested to go into the base. I think dumping everything else you need into --user can cause problems later.

slosar commented 4 years ago

Thanks everyone, this is pretty useful. Let's try a few things and then see how it goes...

samgolds commented 4 years ago

Thank you for all of your help, I have been trying to use these suggestions and have had some success. I am unable to use nbodykit with desc-python kernel, but I was able to load GCRCatalogs into the nbodykit environment.

pip installing nbodykit Installing local user packages using pip for use has resulted in issues with mpi4py ModuleNotFoundError: No module named 'mpi4py' When I try to install mpi4py following nersc suggestions using the following code

wget https://bitbucket.org/mpi4py/mpi4py/downloads/mpi4py-3.0.3.tar.gz
tar zxvf mpi4py-3.0.3.tar.gz
cd mpi4py-3.0.3
module swap PrgEnv-intel PrgEnv-gnu
module unload craype-hugepages2M
python setup.py build --mpicc="$(which cc) -shared"
python setup.py install

I get an error at the module unload step craype-hugepages2M(42):ERROR:102: Tcl command execution failed: set CRAYPE_DIR $env(CRAYPE_DIR) so I'm definitely doing something wrong in trying to install mpi4py.

Loading GCRCatalogs into nbodykit environment The approach suggested by @yymao to just load GCRCatalogs worked and I have been able to run python scripts in this new environment with GCRCatalogs and nbodykit. I am having trouble loading pyccl in this environment. I was able to pip install pyccl in this environment, but when I actually import pyccl I get ModuleNotFoundError: No module named '_ccllib' Here is the full error message:

  File "/global/homes/s/samgolds/.conda/envs/nbodykit-env/lib/python3.7/site-packages/pyccl/ccllib.py", line 20, in swig_import_helper
    return importlib.import_module(mname)
  File "/global/homes/s/samgolds/.conda/envs/nbodykit-env/lib/python3.7/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1006, in _gcd_import
  File "<frozen importlib._bootstrap>", line 983, in _find_and_load
  File "<frozen importlib._bootstrap>", line 967, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 670, in _load_unlocked
  File "<frozen importlib._bootstrap>", line 583, in module_from_spec
  File "<frozen importlib._bootstrap_external>", line 1043, in create_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
ImportError: /global/homes/s/samgolds/.conda/envs/nbodykit-env/lib/python3.7/site-packages/pyccl/_ccllib.so: invalid ELF header

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/global/homes/s/samgolds/.conda/envs/nbodykit-env/lib/python3.7/site-packages/pyccl/__init__.py", line 21, in <module>
    from . import ccllib as lib
  File "/global/homes/s/samgolds/.conda/envs/nbodykit-env/lib/python3.7/site-packages/pyccl/ccllib.py", line 23, in <module>
    _ccllib = swig_import_helper()
  File "/global/homes/s/samgolds/.conda/envs/nbodykit-env/lib/python3.7/site-packages/pyccl/ccllib.py", line 22, in swig_import_helper
    return importlib.import_module('_ccllib')
  File "/global/homes/s/samgolds/.conda/envs/nbodykit-env/lib/python3.7/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
ModuleNotFoundError: No module named '_ccllib'

I think I can work around using pyccl so this isn't too big of a deal, but if there is some quick fix for this it might make things easier.

yymao commented 4 years ago

@samgolds from the log it seems to me that you are using your own conda environment that contains nbodykit? If so, can you try just conda install pyccl in your own conda environment? In other words,

conda activate nbodykit-env  # if not yet activated
conda install -c conda-forge pyccl

BTW you can also pip install GCRCatalogs in your conda environment so that you don't need to add sys.path.insert in your notebooks:

conda activate nbodykit-env  # if not yet activated
pip install https://github.com/LSSTDESC/gcr-catalogs/archive/v0.14.5.zip
samgolds commented 4 years ago

@yymao thanks, that resolved the issue!

slosar commented 4 years ago

Ok, here is another possibility with cloned environment

source /global/common/software/lsst/common/miniconda/setup_current_python.sh
conda create --name anzestack --clone stack
conda activate anzestack
conda install -c bccp nbodykit
heather999 commented 4 years ago

I didn't forget about this @slosar @samgolds :) We have a new desc-python installation now available at NERSC, using py3.7 and preferring conda-forge channels where possible.
I've been playing with user environments and setting things up against the shared desc-python env. There are instructions in the desc-python Wiki This is not the full story for nbodykit however. A simple conda install -bccp nbodykit fails, seemingly due to dependency conflicts with gsl, so I needed to revert to first installing cython and then doing a pip install nbodykit. This works, and I have it set up to work in jupyter.nersc.gov.

source /global/common/software/lsst/common/miniconda/setup_current_python.sh
conda create --clone desc -p /global/common/software/lsst/users/heatherk/mydesc
conda activate /global/common/software/lsst/users/heatherk/mydesc
conda install -c conda-forge cython
pip install nbodykit

To use this in jupyter, I add a new environment variable in my $HOME/.bashrcon Cori:

export DESCUSERENV=/global/common/software/lsst/users/heatherk/mydesc

Now I can run nbodykit examples in my notebooks.

heather999 commented 4 years ago

No activity on this issue in awhile and I think the issues have been addressed. Closing. Feel free to open a new issue if additional problems come up.