NSLS-II / lightsource2-recipes

[ARCHIVED] Use https://github.com/conda-forge instead.
https://anaconda.org/lightsource2-tag
BSD 3-Clause "New" or "Revised" License
4 stars 22 forks source link

Initiating a discussion on supporting more package variants #712

Open leofang opened 5 years ago

leofang commented 5 years ago

To make https://github.com/NSLS-II/lightsource2-recipes/pull/486#discussion_r291437307 a standalone issue. Text below are revised based on that comment.

First, some packages and libraries support (NVIDIA) GPUs. Taking the MPI libraries as an example, they can be "CUDA-aware" by passing the --with-cuda flag or alike to the configure script so that the MPI library is built and linked against CUDA driver and runtime libraries. At least Open MPI and MVAPICH support this feature.

(The purpose of doing so is to support (more or less) architecture-agnostic codes. For example, one can pass a GPU pointer to the MPI API without performing explicit data movement, and under the hood MPI will resolve it and recognize the data lives on GPU. Some low-level optimization for such operations is also implemented by the MPI vendors, such as direct inter-GPU communication bypassing the host and even collective number crunching on GPUs.)

Another example is tomopy, which supports MPI+GPU recently if I'm not mistaken. However, in our internal channel and conda-forge there is only CPU version. For some reason recent effort on updating the recipe didn't get merged (conda-forge/tomopy-feedstock#18). We should keep an eye on this.

Next, non-Python libraries (ex: HDF5, FFTW) can be built against MPI to provide asynchronous/parallel processing. Then, the corresponding Python wrappers (ex: h5py, PyFFTW, and mpi4py for MPI itself) need to be built against those specialized versions.

Taking all these into account, it means at the Conda level the number of package variants inflates quickly = (build against MPI yes or no?) * (# of available MPI libraries) * (CUDA-aware MPI yes or no?) * (# of supported CUDA toolkit versions, if requiring GPU support), and I am not sure what is the best strategy to handle this. (Use build string as unique id? Use different output names?) Too many degrees of freedom come into play, and so far we only fulfill the minimum requirement.

I feel that eventually a dedicated shell or Python script will be needed to help Conda resolve this issue, especially in the coming Jupyter-SDCC era, in which high-performance libraries may be favored. The meta.yaml recipe alone might not be enough. But I could be wrong.

leofang commented 5 years ago

Just did a bit search. For h5py + MPI, this is conda-forge's solution: https://github.com/conda-forge/h5py-feedstock/blob/master/recipe/meta.yaml Not sure if we have room to chain more info in the build string though.

CJ-Wright commented 5 years ago

I would advise making use of the outputs key, to handle the downstream variants. See https://github.com/conda-forge/airflow-feedstock/blob/master/recipe/meta.yaml I would also advise not taking the path that airflow took by writing out everything by hand. At that point I think using the jinja2 approach would be cleaner and less prone to errors.

Note that conda-froge doesn't build GPU versions of its code because we have no way to currently check the validity of the packages (with no GPUs to test on). We're working on a solution to this but I don't think we have a working framework for it yet. See this issue for conda forge gpu discussions: https://github.com/conda-forge/conda-forge.github.io/issues/63

leofang commented 5 years ago

I would advise making use of the outputs key, to handle the downstream variants.

@CJ-Wright So you mean something like - name: {{ name }}-with-openmpi-cuda_aware-cuda91?

Note that conda-froge doesn't build GPU versions of its code because we have no way to currently check the validity of the packages (with no GPUs to test on). We're working on a solution to this but I don't think we have a working framework for it yet.

I know that cudatoolkit is currently not suitable for downstream packages to depend on. This is partly why I opened this issue: for the time being we need a homegrown solution for GPU support. Most likely, we should install latest CUDA toolkit in the docker image, and let nvcc build backward compatible CUDA binaries. @mrakitin thoughts?

leofang commented 5 years ago

btw, @CJ-Wright, why is output key better than build string?

mrakitin commented 5 years ago

I don't have a strong opinion on that topic as it's pretty new to me. Do we need real GPUs to use nvcc?

leofang commented 5 years ago

No. nvcc can be run without GPUs. For example, in the Institutional Cluster (part of SDCC) the submit machines do not have GPU, but we can build CUDA programs there and then submit GPU jobs. The key is to install CUDA toolkit in the default path (/usr/local/cuda/ in Linux).

CJ-Wright commented 5 years ago

Yes but I would do that as

- name: {{ name }}-{{ mpi_flag }}-{{ cuda_flag }}-{{ cuda_version}}

kind of thing (you'd need to work on that a little bit more but that is the basic gist).

I think this is a bit more explicit for users, since they ask for the exact thing that they want in the package name. Although the principle of jinja2 templating would be the same.

leofang commented 5 years ago

Yes yes I agree with you @CJ-Wright. I was thinking about the same approach but forgot about jinja.

leofang commented 5 years ago

After thinking about this a bit, I changed my mind and I'm in favor of the build string approach, because the output name approach would be too obscure for general users who just want to install the current default:

conda install h5py-nompi-nocuda-0

which should really just be conda install h5py as it is now.

For the record, h5py supports variants through build string, see https://github.com/conda-forge/h5py-feedstock/blob/master/recipe/meta.yaml. So, if one wants the MPI support, one just does

conda install h5py=*=mpi_openmpi*

otherwise with conda install h5py the nompi version is preferred (via setting a higher build number, @CJ-Wright why does this work?). This will not interfere with general needs and yet provides a way of customization for advanced users.

CJ-Wright commented 5 years ago

Higher build numbers are preferred, so conda will use the nompi unless you ask otherwise.

leofang commented 5 years ago

A GPU version of tomopy is added to conda-forge: conda-forge/tomopy-feedstock#25. I'd like to try that approach to resolve this issue.

leofang commented 5 years ago

Conda's support of CUDA detection: https://github.com/conda/conda/blob/0fd7941d545ef47930da10ea297b6c174050b1de/docs/source/user-guide/tasks/manage-virtual.rst

mrakitin commented 5 years ago

Yeah, saw it yesterday, wanted to let you know, @leofang, but you were faster :).

leofang commented 5 years ago

Conda-forge now has an official policy for MPI support: https://conda-forge.org/docs/maintainer/knowledge_base.html#message-passing-interface-mpi