amkozlov / raxml-ng

RAxML Next Generation: faster, easier-to-use and more flexible
GNU Affero General Public License v3.0
376 stars 62 forks source link

libpll not being included in raxml-ng-mpi.so #91

Open tardigradus opened 4 years ago

tardigradus commented 4 years ago

I have built the shared library via

-DUSE_MPI=ON -DUSE_TERRAPHAST=OFF -DBUILD_AS_LIBRARY=ON

but when the library is used by ParGenes the following error is produced:

Error: /trinity/shared/easybuild/software/RAxML-NG/0.9.0-foss-2018b-OpenMPI-3.1.3/lib/raxml-ng-mpi.so: undefined symbol: pll_hardware

Doing ldd on the shared library shows that it is not linked to libpll. My understanding of src/CMakeList.txt was that if USE_LIBPLL_CMAKE is not set, then the prebuilt files for libpll in localdeps will be used. Is that not the case?

amkozlov commented 4 years ago

Hi @tardigradus,

before we go into details, may I ask what is your goal here? If you want to build ParGenes, then it is sufficient to recursively (!) clone github repo and run install.sh:

https://github.com/BenoitMorel/ParGenes

ParGenes comes with a bundled version of RAxML-NG, so you do not need to compile raxml-ng-mpi.so separately.

tardigradus commented 4 years ago

I am installing these programs as an administrator on an HPC cluster. Thus, we may have users who want to use RAxML on it own and others who want to run ParGenes. We manage our software using EasyBuild. The basic idea is that each bit of software can be loaded as a so-called module. Modules which depend on other modules just load them as dependencies. This way all the software can be optimized for our architecture. This idea is sort of orthogonal to the approach whereby a program bundles all the stuff it depends on to get people up an running as quickly and painlessly as possible.

So ideally I would want to build PLL as a stand-alone module and then add it as a dependency to the RAxML and ModuleTest modules. I appreciate that this is extra work and not what most users may need. However, if people are going to be using MPI, they are probably going to be doing this on clusters, on which such bundling may be less convenient.

Do you see a good way to go forward here?

BenoitMorel commented 4 years ago

Dear Tardigradus,

Can you tell me which raxml-ng release (or branch and commit) you are using?

Benoit

tardigradus commented 4 years ago

I'm using version 0.9.0.

BenoitMorel commented 4 years ago

So ideally I would want to build PLL as a stand-alone module and then add it as a dependency to the RAxML and ModuleTest modules

I also wanted to have the same PLL "module" for building both modeltest and raxml-ng in ParGenes. The issue with this approach is that RAxML and ModelTest releases/tags do not always require the same PLL version. Although I don't like rebuilding PLL several times, it ended up being the less unsatisfying solution...

Doing ldd on the shared library shows that it is not linked to libpll

Libpll is statically built and included in the raxml-ng(-mpi).so, that's why you don't see it with ldd. The issue you have (with pll_hardware) reminds me something, but I don't remember exactly what was happening. I tried to compile raxml 0.9.0 and to call it with "--raxml-binary", but it seemed to work on my machine. How can I reproduce your setup? Can you send me the exact command lines you use for installing the different components, and the command line for running pargenes?

Although I do not recommend having PLL as a stand-alone module, using an existing raxml-ng module with pargenes is something that sounds realistic to me.

tardigradus commented 4 years ago

BenoitMorel notifications@github.com writes:

So ideally I would want to build PLL as a stand-alone module and then add it as a dependency to the RAxML and ModuleTest modules

I also wanted to have the same PLL "module" for building both modeltest and raxml-ng in ParGenes. The issue with this approach is that RAxML and ModelTest releases/tags do not always require the same PLL version. Although I don't like rebuilding PLL several times, it ended up being the less unsatisfying solution...

The module approach is specifically designed to allow multiple versions/variants of a piece of software to be installed in parallel (see Lmod). So when we install new version of something, the old version is kept, because some people might be in the middle of an analysis and not want to swap versions.

Doing ldd on the shared library shows that it is not linked to libpll

Libpll is statically built and included in the raxml-ng(-mpi).so, that's why you don't see it with ldd. The issue you have (with pll_hardware) reminds me something, but I don't remember exactly what was happening. I tried to compile raxml 0.9.0 and to call it with "--raxml-binary", but it seemed to work on my machine. How can I reproduce your setup? Can you send me the exact command lines you use for installing the different components,

I built RAxML-NG with:

cmake -DCMAKE_INSTALL_PREFIX=/home/loris/easybuild/software/RAxML-NG/0.9.0-foss-2018b-OpenMPI-3.1.3 -DCMAKE_BUILD_TYPE=Release -DCMAKE_C_COMPILER='gcc' -DCMAKE_Fortran_FLAGS='-O2 -ftree-vectorize -march=native -fno-math-errno' -DCMAKE_CXX_FLAGS='-O2 -ftree-vectorize -march=native -fno-math-errno' -DCMAKE_CXX_COMPILER='g++' -DCMAKE_Fortran_COMPILER='gfortran' -DCMAKE_C_FLAGS='-O2 -ftree-vectorize -march=native -fno-math-errno' -DCMAKE_VERBOSE_MAKEFILE=ON -DUSE_MPI=ON -DUSE_TERRAPHAST=OFF -DBUILD_AS_LIBRARY=OFF /home/loris/easybuild/build/RAxMLNG/0.9.0/foss-2018b-OpenMPI-3.1.3/"

The user is using his own version of ParGenes so I am not sure how that was built. I'm currently working on providing this myself.

and the command line for running pargenes?

This is the way the user seems to be running ParGenes:

python ~/opt/ParGenes/pargenes/pargenes-hpc.py -a $SHMDIR -o $SHMDIR/out -c $SLURM_NTASKS -d nt --use-modeltest --modeltest-criteria BIC --parsimony-starting-trees 25 --random-starting-trees 25 --seed ${RANDOM_SEED} --autoMRE --bs-tree 500 --raxml-binary /trinity/shared/easybuild/software/RAxML-NG/0.9.0-foss-2018b-OpenMPI-3.1.3/lib/raxml-ng-mpi.so

The path to 'raxml-ng-mpi.so' is different because the user is using the standard published module. The 'cmake' line above is from the log of a build which I re-ran as a regular user (normally if a build is successful, the logs get thrown away).

Although I do not recommend having PLL as a stand-alone module, using an existing raxml-ng module with pargenes is something that sounds realistic to me.

What would be the problem in having PLL as a stand-alone module? Is it just the issue of having needing multiple versions?

(Sorry the formatting is a bit borked :-( I tried replying via email, but it seems that's not a good idea.)

-- Tardigradus

BenoitMorel commented 4 years ago

My understanding is that raxml-ng should always be compiled with the exact PLL version it points to. As long as this is respected, I don't see any problem. But then I am not sure that there is a real interest. For this specific question maybe @amkozlov knows better than I.

I still can't reproduce the ParGenes issue.

tardigradus commented 4 years ago

One of the ideas of EasyBuild is to ensure reproducibility of builds. Thus, a given module ties a specific version of RAxML-NG to a specific version of PLL. This is already done for OpenMPI, which is also reflected in the name of the module.

In answer to the third question: Yes, the the library does contain the string "pll_hardware". I'll contact the user regarding the other questions.

tardigradus commented 4 years ago

Here are the answers to the other questions:

  1. Yes, I have run ParGenes successfully (as long as I use only one node) with the raxml binary built during its own installation.
  2. I do not think so, these are the modules I used for installation of ParGenes: module load GCC/7.3.0-2.30 module load CMake/3.10.2-GCCcore-7.3.0 module load impi/2018.3.222-iccifort-2018.3.222-GCC-7.3.0-2.30
  3. (actually question 4) The debug version using the executable rather than the library works.

I think problem is that whereas RAxML-NG was compiled with OpenMPI, the user loaded the Intel MPI module (impi). That is probably not a good idea. I'll ask the user to rebuild with OpenMPI.

amkozlov commented 4 years ago

Hi @tardigradus,

ok I see your point. Still, I believe that at least for the standalone RAxML-NG, linking LIBPLL statically is the best choice, as opposed to having it as a module in EasyBuild. Few benefits of the latter (small - if any - diskspace/memory savings and "cleaner" setup) simply do not justify the descent into the dependency hell, even if tools like EasyBuild can manage it to some degree. Of course, the situation is different for general-purpose libs like OpenMPI. So if one day we will have dozens of programs using LIBPLL, then the overhead of dependency-tracking might pay off, and we can reconsider proper versioning, packaging and dynamic linking of LIBPLL.

For ParGenes, however, it sounds more reasonable to use existing ModelTest-ng and RAxML-NG installation. And conceptually this should be possible, even though both ModelTest-ng and RAxML-NG will have its own statically-linked version of LIBPLL So I hope you can figure out the solution together with @BenoitMorel . Best, Alexey

tardigradus commented 4 years ago

OK, the build instructions here:

https://github.com/amkozlov/raxml-ng/wiki/installation#mpi-enabled-version

don't say anything about setting up the stuff under localdeps. The original error seen by the the user was

raxml-ng-mpi.so: undefined symbol: pll_hardware

so does that imply that the static PLL library is not being compiled into the final shared object?

Could you maybe point me to a simple test case which I can use to verify the completeness of the build?

BenoitMorel commented 4 years ago

I just added a test script https://github.com/BenoitMorel/ParGenes/blob/master/tests/test_custom_raxml_library.sh

If you want to test your own raxml-ng-mpi.so, you can do the following:

git clone --recursive https://github.com/BenoitMorel/ParGenes.git
cd ParGenes
./install_scheduler_only.sh
cd tests
./test_custom_raxml_library.sh path_to_your_raxml_library

Is this what you wanted?

tardigradus commented 4 years ago

Not exactly. Actually I wanted to know whether libpll had been compiled into raxml-ng.so properly. I have recompiled from a recursive pull of the Github repo and raxml-ng.so now contains strings such as

.../libs/pll-modules/libs/libpll/src/compress.c

(the ellipsis is mine).

Is there simple test I can do to check that the

raxml-ng-mpi.so: undefined symbol: pll_hardware

error isn't thrown?

BenoitMorel commented 4 years ago

ParGenes is the only program that can run raxml as a library. So the simplest way to test that the error is not thrown is to run the 4 commands I gave you in my last message, replacing path_to_your_raxml_library with the path to your raxml-ng-mpi.so.