charmplusplus / charm

The Charm++ parallel programming system. Visit https://charmplusplus.org/ for more information.
Apache License 2.0
202 stars 49 forks source link

Compilation with UCX backend #3227

Closed afernandezody closed 3 years ago

afernandezody commented 3 years ago

Hello, The compilation of charm-v6.10.2 with UCX support is failing. I'm using the commands ./build charm++ ucx-linux-x86_64 --with-production or ./build charm++ ucx-linux-x86_64 slurmpmi --with-production, and getting the error message:

...
checking "whether build on UCX"... "no"
Error: Unable to compile UCX
*** Please find detailed output in tmp/charmconfig.out ***
gmake[1]: *** No rule to make target 'conv-autoconfig.h', needed by 'xi-main.o'.  Stop.
gmake[1]: Leaving directory '/home/centos/MD3/charm-v6.10.2/ucx-linux-x86_64-slurmpmi/tmp'
gmake: *** [Makefile:331: headers] Error 2
-------------------------------------------------
Charm++ NOT BUILT. Either cd into ucx-linux-x86_64-slurmpmi/tmp and try
to resolve the problems yourself, visit
http://charm.cs.illinois.edu/
for more information. Otherwise, email the developers at charm@cs.illinois.edu

The end of the charmconfig.out file shows:

gcc -DCMK_GFORTRAN -I../include -I. -I/usr/include/ -I./proc_management/ -I./proc_management/simple_pmi/ -c test.c -o test.o -lucp
test.c:1:10: fatal error: ucp/api/ucp.h: No such file or directory
 #include <ucp/api/ucp.h>
          ^~~~~~~~~~~~~~~
compilation terminated.

The system has UCX(v1.9.0) and OpenMPI. My first doubt is that I don't understand how you tell the charm configuration where UCX is (other than via OpenMPI) or what exactly is trying to do with it (the meaning of the messages 'checking "whether build on UCX"... "no"' & 'Error: Unable to compile UCX' are not clear to me). Also, the documentation states Additionally, in order to use the other supported process management interfaces, it is required to have a non-OpenMPI based MPI implementation installed on the system (e.g. Intel MPI, MVAPICH, MPICH, etc.). Is this a strict requirement? Do you really need to have 2 MPI wrappers? And if you don't, is that the reason why the compilation is failing? Thanks.

nitbhat commented 3 years ago

Hi @afernandezody,

Yes, it looks like the charm build is not finding the system installed UCX because by default charm looks at (/usr/include and /usr/lib64/). If you know the location of the installed UCX, you can pass it to the charm build command using the '--basedir=' option. This should solve the issue.

You can also build UCX from source and pass that directory to the charm build command with the ‘--basedir' option.

As for your second question: No, the compilation is not failing because of the OpenMPI dependency. It looks like it just needs the UCX build directory path to access the headers. Separately, for the launchers, with the UCX build, we depend on the MPI launcher to launch the program. Since OpenMPI uses PMIx for it’s PMI setup, it is not compatible with Slurm PMI and for this reason, you would have to use the MPI launcher (mpirun) from other MPI versions like MPICH, Mvapich, Intel MPI to launch your job when you build charm-ucx backend with slurmpmi or simple pmi.

Also, while running your program, in order to access the UCX shared libraries, ensure that you have added /lib to your LD_LIBRARY_PATH.

Let us know if you have any other questions or face further issues.

stwhite91 commented 3 years ago

Closing since no feedback since January saying this was anything other than a library path issue

afernandezody commented 3 years ago

My apologies as I completely forgot this thread was open (and haven't worked on this installation since Jan because of other projects).