FredHutch / easybuild-life-sciences

Howto and implementation documentation
https://fredhutch.github.io/easybuild-life-sciences/
21 stars 6 forks source link

Set environment variable RETICULATE_PYTHON in python modules #502

Closed atombaby closed 3 years ago

atombaby commented 3 years ago

As per our last discussion, we should set RETICULATE_PYTHON when we load a Python module.

The reticulate R library (part of the R module) won't use the python from the base python module as there isn't a numpy install in the base python module. For example, if you load:

ml R/4.1.0-foss-2020b Python/3.8.6-GCCcore-10.2.0

And then start R and run:

library('reticulate')
py_discover_config()

The python used is from some other location:

python:         /usr/bin/python3
libpython:      /usr/lib/python3.6/config-3.6m-x86_64-linux-gnu/libpython3.6.so
pythonhome:     //usr://usr
version:        3.6.9 (default, Jan 26 2021, 15:33:00)  [GCC 8.4.0]
numpy:          /usr/lib/python3/dist-packages/numpy
numpy_version:  1.13.3

python versions found: 
 /app/software/Python/3.8.6-GCCcore-10.2.0/bin/python3
 /usr/bin/python3
 /usr/bin/python

If we force the issue with the environment variable:

rhino03[~/Tickets/5213-reticulate]: export RETICULATE_PYTHON=$(which python)
rhino03[~/Tickets/5213-reticulate]: echo $RETICULATE_PYTHON 
/app/software/Python/3.8.6-GCCcore-10.2.0/bin/python

The py_discover_config() function returns:

python:         /app/software/Python/3.8.6-GCCcore-10.2.0/bin/python
libpython:      /app/software/Python/3.8.6-GCCcore-10.2.0/lib/libpython3.8.so
pythonhome:     /app/software/Python/3.8.6-GCCcore-10.2.0:/app/software/Python/3.8.6-GCCcore-10.2.0
version:        3.8.6 (default, Dec 16 2020, 13:45:25)  [GCC 10.2.0]
numpy:           [NOT FOUND]

NOTE: Python version was forced by RETICULATE_PYTHON

The reticulate docs indicate that:

in all cases will attempt to locate a version which includes the first Python package imported via the import() function

In this case, py_discover_config attempts to load numpy but fails with the bare-bones Python module.

Setting RETICULATE_PYTHON forces Reticulate to use the loaded module. The environment variable has no other impacts so should be safe to set even if Reticulate isn't used.

fizwit commented 3 years ago

If numpy is a requirement, the SciPy-Bundle will have to loaded. SciPy loads a base Python package. Also Note that I am only building one version of Python per toolchain. When loading Python and R the toolchains need to match.

fizwit commented 3 years ago

The last two core versions of Python modules have been updated to define RETICULATE_PYTHON. These two packages overlap with the last 13 versions of R that are available. Users should load SciPy-Bundle to ensure that NumPy is available.

Python/3.8.2-GCCcore-9.3.0
Python/3.8.6-GCCcore-10.2.0
atombaby commented 3 years ago

Cool thanks- just as an aside, there isn't an explicit requirement for numpy in reticulate. It's somewhere in the function py_discover_config that attempts to load numpy.

If the user's code doesn't require numpy, reticulate shouldn't.