amusecode / amuse

Astrophysical Multipurpose Software Environment. This is the main repository for AMUSE
http://www.amusecode.org
Apache License 2.0
155 stars 98 forks source link

Problem building AMUSE from source #1007

Open Spijkerberg opened 10 months ago

Spijkerberg commented 10 months ago

Hello,

I am trying to build AMUSE from source on a sterrewacht computer. I am running into issues with certain packages not being built.

Here is the buildlog of the building process.

Thanks in advance, Menno

LourensVeen commented 10 months ago

Could you paste the output of module list for the shell in which you're building?

LourensVeen commented 10 months ago

Looks like Steven is onto something:

make: Entering directory '/data2/vandereerden/amuse/src/amuse/community/capreole'
make -C build_mpi amuse_interface_mpi  VPATH=../src F90FLAGS1="-g -O2 -DNOMPI -fPIC  " FC="/usr/bin/gfortran -fallow-argument-mismatch" MPIFC="/usr/bin/gfortran -fallow-argument-mismatch"
make[1]: Entering directory '/data2/vandereerden/amuse/src/amuse/community/capreole/build_mpi'
/usr/bin/gfortran -fallow-argument-mismatch -c -g -O2 -DNOMPI -fPIC   -DMPI  -I/data2/vandereerden/amuse/lib/stopcond ../src/amuse_mpi.F90
../src/amuse_mpi.F90:54:0:

   54 |   include 'mpif.h'
      | 
Fatal Error: Cannot open included file ‘mpif.h’
compilation terminated.

That's definitely not right. The question is why this is happening. I should be getting access to these machines, so then I can experiment a little.

rieder commented 10 months ago

It seems to use gfortran, I would expect mpifortran needs to be used here instead. The argument MPIFC="/usr/bin/gfortran -fallow-argument-mismatch" is probably causing this.

rieder commented 10 months ago

MPIFC should not be set to gfortran, keep it at mpifortran or undefined. You can set FC to gfortran -fallow-argument-mismatch if this is important, but redefining an MPI compiler to a non-MPI one is probably what's causing the issue.

rieder commented 10 months ago

... thinking about it further, I guess this may follow from AMUSE being configured without MPI in the first place. So a first step would be to re-run configure in the AMUSE dir, and checking config.mk for the definition of MPIFC there.

Spijkerberg commented 10 months ago

The output of module load is: Currently Loaded Modules: 1) localhosts 2) GCCcore/12.3.0 3) zlib/1.2.13-GCCcore-12.3.0 4) binutils/2.40-GCCcore-12.3.0 5) GCC/12.3.0 6) numactl/2.0.16-GCCcore-12.3.0 7) XZ/5.4.2-GCCcore-12.3.0 8) libxml2/2.11.4-GCCcore-12.3.0 9) libpciaccess/0.17-GCCcore-12.3.0 10) hwloc/2.9.1-GCCcore-12.3.0 11) OpenSSL/1.1 12) libevent/2.1.12-GCCcore-12.3.0 13) UCX/1.14.1-GCCcore-12.3.0 14) libfabric/1.18.0-GCCcore-12.3.0 15) PMIx/4.2.4-GCCcore-12.3.0 16) UCC/1.2.0-GCCcore-12.3.0 17) OpenMPI/4.1.5-GCC-12.3.0

This is the result of loading the modules 'AMUSE' (AMUSE/2023.5.1) and 'OpenMPI'. In that order.

rieder commented 10 months ago

I'd like to see the config.mk file (in the AMUSE root dir) of the sterrewacht installation of AMUSE, can you find this @Spijkerberg?

Spijkerberg commented 10 months ago

I found the file in the AMUSE root dir on the sterrewacht machine. I turned it into a txt file to share it.

rieder commented 10 months ago

Thanks. This shows AMUSE was configured without MPI support, so no wonder stuff that requires MPI breaks... This will require fixing on the module level.

Spijkerberg commented 10 months ago

Looking at the config.mk that was produced from my own installation from source, I see that MPI_ENABLED=no, while I did load the MPI module before installing. I could try to install AMUSE from source again, but I do not know if that would help.

rieder commented 10 months ago

You will also need to install mpi4py (pip install mpi4py). If this is not detected, configure will set MPI_ENABLED to no.

Spijkerberg commented 10 months ago

Looking at the packages enabled in my environment I see that mpi4py is already installed. Here is the output of pip freeze: -e git+https://github.com/amusecode/amuse.git@72c4a3c32c21e48f3a823af9f742c7de2684138b#egg=amuse_devel docutils==0.20.1 h5py==3.10.0 iniconfig==2.0.0 mpi4py==3.1.5 numpy==1.26.2 packaging==23.2 pluggy==1.3.0 pytest==7.4.3 setuptools-scm==8.0.4 typing_extensions==4.8.0

rieder commented 10 months ago

If you re-run configure, does that change the config.mk file?

Spijkerberg commented 10 months ago

This did change the config.mk file. I can see that MPI_ENABLED=yes is set correctly now. I will try rebuilding AMUSE to see what the result is.

Spijkerberg commented 10 months ago

It seems that some more of the community codes have been built, but there are still some errors when building. I have provided the buildlogs again for you to inspect.

Testing to see if the community codes work results in UNPACK-OPAL-VALUE: UNSUPPORTED TYPE 33 FOR KEY.

LourensVeen commented 10 months ago

OPAL is OpenMPI's utility library. This error sounds like some kind of data format mismatch, which suggests there are different versions of MPI in use. Possibly mpi4py got compiled against a different version of MPI than AMUSE? Or the version you have active when running your script doesn't match the one that was loaded when mpi4py was installed and/or when AMUSE was built?

LourensVeen commented 10 months ago

I had a go at building AMUSE on a Sterrenwacht machine. Progress so far:

(amuse-env) <user>@<host>:~/amuse$ python setup.py develop_build
Illegal instruction (core dumped)

Looks like some kind of numpy build issue. Time to start digging...

rieder commented 10 months ago

If you do module load AMUSE you should now get the correct prerequisites, there was an issue with mpi4py installing the wrong mpi...

rieder commented 10 months ago

OPAL is OpenMPI's utility library. This error sounds like some kind of data format mismatch, which suggests there are different versions of MPI in use. Possibly mpi4py got compiled against a different version of MPI than AMUSE? Or the version you have active when running your script doesn't match the one that was loaded when mpi4py was installed and/or when AMUSE was built?

this was exactly the issue. mpi4py was installed in an incorrect way, built against the wrong (conda) openmpi library - which then clashed with the correct one. This also caused the wrong configuration of the AMUSE module.

LourensVeen commented 10 months ago

What a mess. Actually, when I try to module load AMUSE I get this:

Lmod has detected the following error:  Unable to load module because of error when evaluating modulefile:
    /easybuild/easybuild/el8_8/modules/all/AMUSE/2023.10.0.lua: Empty or non-existent file
    Please check the modulefile and especially if there is a line number specified in the above message  
While processing the following module(s):
   Module fullname  Module Filename
   ---------------  ---------------
   AMUSE/2023.10.0  /easybuild/easybuild/el8_8/modules/all/AMUSE/2023.10.0.lua

That lua script exists, but has permissions 600, so it can't read it...

Seems like I should put a working EasyBuild configuration for AMUSE on my to-do list, after Conda packages and a new build system.

rieder commented 10 months ago

I think the script may be getting updated at the moment, which could account for that weirdness. But yes, an “official” easybuild module might be a good idea.