Open leofang opened 5 years ago
Just did a bit search. For h5py + MPI, this is conda-forge's solution: https://github.com/conda-forge/h5py-feedstock/blob/master/recipe/meta.yaml Not sure if we have room to chain more info in the build string though.
I would advise making use of the outputs key, to handle the downstream variants. See https://github.com/conda-forge/airflow-feedstock/blob/master/recipe/meta.yaml I would also advise not taking the path that airflow took by writing out everything by hand. At that point I think using the jinja2 approach would be cleaner and less prone to errors.
Note that conda-froge doesn't build GPU versions of its code because we have no way to currently check the validity of the packages (with no GPUs to test on). We're working on a solution to this but I don't think we have a working framework for it yet. See this issue for conda forge gpu discussions: https://github.com/conda-forge/conda-forge.github.io/issues/63
I would advise making use of the outputs key, to handle the downstream variants.
@CJ-Wright So you mean something like - name: {{ name }}-with-openmpi-cuda_aware-cuda91
?
Note that conda-froge doesn't build GPU versions of its code because we have no way to currently check the validity of the packages (with no GPUs to test on). We're working on a solution to this but I don't think we have a working framework for it yet.
I know that cudatoolkit
is currently not suitable for downstream packages to depend on. This is partly why I opened this issue: for the time being we need a homegrown solution for GPU support. Most likely, we should install latest CUDA toolkit in the docker image, and let nvcc
build backward compatible CUDA binaries. @mrakitin thoughts?
btw, @CJ-Wright, why is output key better than build string?
I don't have a strong opinion on that topic as it's pretty new to me. Do we need real GPUs to use nvcc?
No. nvcc can be run without GPUs. For example, in the Institutional Cluster (part of SDCC) the submit machines do not have GPU, but we can build CUDA programs there and then submit GPU jobs. The key is to install CUDA toolkit in the default path (/usr/local/cuda/
in Linux).
Yes but I would do that as
- name: {{ name }}-{{ mpi_flag }}-{{ cuda_flag }}-{{ cuda_version}}
kind of thing (you'd need to work on that a little bit more but that is the basic gist).
I think this is a bit more explicit for users, since they ask for the exact thing that they want in the package name. Although the principle of jinja2 templating would be the same.
Yes yes I agree with you @CJ-Wright. I was thinking about the same approach but forgot about jinja.
After thinking about this a bit, I changed my mind and I'm in favor of the build string approach, because the output name approach would be too obscure for general users who just want to install the current default:
conda install h5py-nompi-nocuda-0
which should really just be conda install h5py
as it is now.
For the record, h5py supports variants through build string, see https://github.com/conda-forge/h5py-feedstock/blob/master/recipe/meta.yaml. So, if one wants the MPI support, one just does
conda install h5py=*=mpi_openmpi*
otherwise with conda install h5py
the nompi
version is preferred (via setting a higher build number, @CJ-Wright why does this work?). This will not interfere with general needs and yet provides a way of customization for advanced users.
Higher build numbers are preferred, so conda will use the nompi unless you ask otherwise.
A GPU version of tomopy is added to conda-forge: conda-forge/tomopy-feedstock#25. I'd like to try that approach to resolve this issue.
Conda's support of CUDA detection: https://github.com/conda/conda/blob/0fd7941d545ef47930da10ea297b6c174050b1de/docs/source/user-guide/tasks/manage-virtual.rst
Yeah, saw it yesterday, wanted to let you know, @leofang, but you were faster :).
Conda-forge now has an official policy for MPI support: https://conda-forge.org/docs/maintainer/knowledge_base.html#message-passing-interface-mpi
To make https://github.com/NSLS-II/lightsource2-recipes/pull/486#discussion_r291437307 a standalone issue. Text below are revised based on that comment.
First, some packages and libraries support (NVIDIA) GPUs. Taking the MPI libraries as an example, they can be "CUDA-aware" by passing the
--with-cuda
flag or alike to the configure script so that the MPI library is built and linked against CUDA driver and runtime libraries. At least Open MPI and MVAPICH support this feature.(The purpose of doing so is to support (more or less) architecture-agnostic codes. For example, one can pass a GPU pointer to the MPI API without performing explicit data movement, and under the hood MPI will resolve it and recognize the data lives on GPU. Some low-level optimization for such operations is also implemented by the MPI vendors, such as direct inter-GPU communication bypassing the host and even collective number crunching on GPUs.)
Another example is tomopy, which supports MPI+GPU recently if I'm not mistaken. However, in our internal channel and conda-forge there is only CPU version. For some reason recent effort on updating the recipe didn't get merged (conda-forge/tomopy-feedstock#18). We should keep an eye on this.
Next, non-Python libraries (ex: HDF5, FFTW) can be built against MPI to provide asynchronous/parallel processing. Then, the corresponding Python wrappers (ex: h5py, PyFFTW, and mpi4py for MPI itself) need to be built against those specialized versions.
Taking all these into account, it means at the Conda level the number of package variants inflates quickly
= (build against MPI yes or no?) * (# of available MPI libraries) * (CUDA-aware MPI yes or no?) * (# of supported CUDA toolkit versions, if requiring GPU support)
, and I am not sure what is the best strategy to handle this. (Use build string as unique id? Use different output names?) Too many degrees of freedom come into play, and so far we only fulfill the minimum requirement.I feel that eventually a dedicated shell or Python script will be needed to help Conda resolve this issue, especially in the coming Jupyter-SDCC era, in which high-performance libraries may be favored. The
meta.yaml
recipe alone might not be enough. But I could be wrong.