I2PC / xmipp

Xmipp is a suite of image processing programs, primarily aimed at single-particle 3D electron microscopy.
http://biocomputingunit.es/
GNU General Public License v3.0
38 stars 15 forks source link

Compiling xmipp has hardcoded mpirun with 4 slots #306

Closed MohamadHarastani closed 3 years ago

MohamadHarastani commented 4 years ago

Hi, While compiling xmipp on a personal laptop, I faced an error as follows: "There are not enough slots available in the system to satisfy the 4 slots" I have exactly 4 mpi slots in the processor (Intel® Core™ i7-4500U CPU @ 1.80GHz × 4 ). This printing 4 times was only to print a sentence, but it ended up breaking the compilation. I fixed this issue by replacing '4' by '2' in all the lines 692 to 698 here: https://github.com/I2PC/xmipp/blob/devel/xmipp#L692 Couldn't we test if mpi runs during the installation another way? or turn the error into a warning?

Cheers Mohamad

DStrelak commented 4 years ago

Hi @MohamadHarastani , can you try if this works for you?

MohamadHarastani commented 4 years ago

Thanks @DStrelak , I will check it out asap

MohamadHarastani commented 4 years ago

can you try if this works for you?

Hi @DStrelak , I checked it and it worked. Here is the corresponding output of xmipp compile:

 mpirun -np 4 --oversubscribe echo '    > This sentence should be printed 4 times if mpi runs fine (by mpirun).'
    > This sentence should be printed 4 times if mpi runs fine (by mpirun).
    > This sentence should be printed 4 times if mpi runs fine (by mpirun).
    > This sentence should be printed 4 times if mpi runs fine (by mpirun).
    > This sentence should be printed 4 times if mpi runs fine (by mpirun).

These were the line of codes that I edited (starting from this line):

    if checkProgram("mpirun",False):
        ok=(runJob("mpirun -np 4 --oversubscribe echo '%s (by mpirun).'" % echoString) or
            runJob("mpirun -np 4 --allow-run-as-root --oversubscribe echo '%s (by mpirun).'" % echoString))
    elif checkProgram("mpiexec",False):
        ok=(runJob("mpiexec -np 4 --oversubscribe echo '%s (by mpiexec).'" % echoString) or
            runJob("mpiexec -np 4 --oversubscribe --allow-run-as-root echo '%s (by mpiexec).'" % echoString))

I can't verify if the flag --oversubscribe works with mpiexec I checked increasing 4 to 10 and it worked (to make sure its the flag --oversubscribe that is solving the issue as I have 4 mpi slots that could by chance be empty)

This is what I used to test:

[mohamad@localhost ~]$ which mpirun
/usr/lib64/openmpi3/bin/mpirun
[mohamad@localhost ~]$ mpirun --version
mpirun (Open MPI) 3.1.3

Regards

DStrelak commented 4 years ago

'--oversubscribe' has been added in MPI 2.1. and e.g. Travis uses version 1.6 @dmaluenda , do we / can we detect MPI version?

dmaluenda commented 4 years ago

We can assert the MPI version by parsing 'mpirun --version'. However, we can add the '--oversubscribe' flag to the 'or' string in order to avoid a fail just due to the lack of that flag.

We can, alternativelly, put a 'mpi -np 2 ...' . 2 should be always fine, isn't it?

DStrelak commented 4 years ago

We can, alternativelly, put a 'mpi -np 2 ...' . 2 should be always fine, isn't it?

In theory, yes. I doubt that anybody would be brave enough to use xmipp with less than two cores. How about we link it with the number of jobs used for build?

dmaluenda commented 4 years ago

I thought the same "Who want to run Xmipp with less than 4 cores?" But the answer is "What about the login node in clusters?" Damn!

I agree on linking the number of mpi-jobs to the number of cores for the compilation. Indeed, in the hypothetical case than N=1, if

mpirun -np 1 echo whatever

works, it's fine. We are checking that mpirun works and this prove that.

DStrelak commented 4 years ago

Can we close this issue?

dmaluenda commented 4 years ago

I think yes. I forced to use only 2 cores, that is the minimum that makes sense...

If the problem persist, please, don't hesitate to reopen this to be able to make an more accurate approach.

MohamadHarastani commented 4 years ago

No objection.. thank you both @dmaluenda @DStrelak

MohamadHarastani commented 4 years ago

Hello again, I have just had another similar issue. I am trying to compile xmipp on a supercomputer. Usually, we have a script to run mpi jobs using sbatch. Here is what I get:

[uhj53dz@jean-zay4: xmipp]$ ./xmipp 
 'xmipp.conf' detected.
Checking configuration ------------------------------
Checking compiler configuration ...
 g++ 8 detected
 g++ -c -w -mtune=native -march=native -std=c++11 -O3 xmipp_test_main.cpp -o xmipp_test_main.o -I../ -I/gpfswork/rech/nvo/uhj53dz/miniconda3/include -I/gpfswork/rech/nvo/uhj53dz/miniconda3/include/python3.8 -I/gpfswork/rech/nvo/uhj53dz/miniconda3/lib/python3.8/site-packages/numpy/core/include
 g++  -L/gpfswork/rech/nvo/uhj53dz/miniconda3/lib xmipp_test_main.o -o xmipp_test_main -lfftw3 -lfftw3_threads -lhdf5  -lhdf5_cpp -ltiff -ljpeg -lsqlite3 -lpthread
 rm xmipp_test_main*
Checking MPI configuration ...
 mpicxx -c -w -I../ -I/gpfswork/rech/nvo/uhj53dz/miniconda3/include -mtune=native -march=native -std=c++11 -O3  xmipp_mpi_test_main.cpp -o xmipp_mpi_test_main.o
 mpicxx   -L/gpfswork/rech/nvo/uhj53dz/miniconda3/lib xmipp_mpi_test_main.o -o xmipp_mpi_test_main -lfftw3 -lfftw3_threads -lhdf5  -lhdf5_cpp -ltiff -ljpeg -lsqlite3 -lpthread
 rm xmipp_mpi_test_main*
 mpirun -np 1 echo '    > This sentence should be printed 2 times if mpi runs fine.'
This version of Spack (openmpi ~legacylaunchers schedulers=slurm)
is installed without the mpiexec/mpirun commands to prevent
unintended performance issues. See https://github.com/spack/spack/pull/10340
for more details.
If you understand the potential consequences of a misconfigured mpirun, you can
use spack to install 'openmpi+legacylaunchers' to restore the executables.
Otherwise, use srun to launch your MPI executables.
 mpirun -np 1 --allow-run-as-root echo '    > This sentence should be printed 2 times if mpi runs fine.'
This version of Spack (openmpi ~legacylaunchers schedulers=slurm)
is installed without the mpiexec/mpirun commands to prevent
unintended performance issues. See https://github.com/spack/spack/pull/10340
for more details.
If you understand the potential consequences of a misconfigured mpirun, you can
use spack to install 'openmpi+legacylaunchers' to restore the executables.
Otherwise, use srun to launch your MPI executables.
 mpirun or mpiexec have failed.
 Cannot compile with MPI or use it
 rm xmipp_mpi_test_main*
rm: cannot remove 'xmipp_mpi_test_main*': No such file or directory

I will try to workaround this by commenting this test.. I will reply again here my progress.

Regards, Mohamad

DStrelak commented 4 years ago

Hi @MohamadHarastani , Thanks for reporting this problem. However, I don't think that there's anything we can / should do about this particular case (unless, of course, it turns out that it's a wide-spread issue). As you surely understand, we can't prepare our script for all possible environments. The admin of your machine will be able to resolve this problem. I'm however a bit worried how / if Scipion will work fine in that environment (as we should have the support of Slurm, but AFAIK it's not often used [read: well tested]). If in doubts, feel free to contact us, and we'll gladly help! KR, David

MohamadHarastani commented 4 years ago

Thanks @DStrelak for your reply. I commented the lines that require mpirun and the installation continued. We have using the previous xmipp version on the same super computer (compatible with scipion2) and now I am trying to compile the new version. Of course, we don't need support for all environments or for slurm, we prepare our slurm scripts manually, all what we need is a successful xmipp compilation (it is just a linux redhat, compilation with conda). I have limited experience in slurm, but it is used on the two supercomputers that we have access to here. I passed this step now. Just for the record, I commented these lines from starting from here.

    # if not (runJob("%s -np 2 echo '%s.'" % (configDict['MPI_RUN'], echoString)) or
    #         runJob("%s -np 2 --allow-run-as-root echo '%s.'" % (configDict['MPI_RUN'], echoString))):
    #     print(red("mpirun or mpiexec have failed."))
    #     return False

We can close this issue and rediscuss a solution if needed (maybe a flag to pass this mpirun test with an error message that shows the option to run with this flag).

Regards, Mohamad

DStrelak commented 4 years ago

Hi @MohamadHarastani , I'm glad that it was the only hurdle you've met. The flag to skip (a specific) config test sounds good to me. What do you think, @dmaluenda ?

dmaluenda commented 4 years ago

I agree on a bypassing flag. I vote for an environ var like XMIPP_NOMPICHECK=True or something like that. In this way we can add it to the https://github.com/I2PC/xmipp/wiki/Xmipp-configuration-(version-20.07) guide.

By the way, note that the whole config-checking can be skipped just by manually stepping the build

./xmipp config
./xmipp compileAndInstall

(note the mising ./xmipp checkConfig in between)

MohamadHarastani commented 4 years ago

By the way, note that the whole config-checking can be skipped just by manually stepping the build

./xmipp config
./xmipp compileAndInstall

(note the mising ./xmipp checkConfig in between)

Thanks a lot for this hint. I don't think a flag is necessary in this case. I will try this option soon and comment on the result.

DStrelak commented 3 years ago

Should be resolved .