Open xiangbei007 opened 9 months ago
1、when running this example,I got errors:
mpiexec detected that one or more processes exited with non-zero status, thus causing the job to be terminated. The first process to do so was:
make: *** [Makefile:10: run] Error 15`
2、The content of "flredecomp.err-0":
The target number of processes must be equal or less than the number of processes currently running. *** ERROR *** Source location: (Flredecomp.F90, 120) Error message: Running on insufficient processes. application called MPI_Abort(MPI_COMM_WORLD, 15) - process 0 [unset]: write_line error; fd=-1 buf=:cmd=abort exitcode=15 : system msg for write_line failure : Bad file descriptor
3、The content of "flredecomp.log-0":
In flredecomp /home/wangbo/fluidity/lib/python3.8/site-packages/fluidity/state_types.py fluidity.state_types imported successfully; location: Input base name: backward_facing_step_3d Output base name: backward_facing_step_3d_flredecomp Input number of processes: 1 Target number of processes: 8 Job number of processes: 1
4、The source code mentioned in step2,which is the part of "Flredecomp.F90":
! Input check if(input_nprocs < 0) then FLExit("Input number of processes cannot be negative!") else if(target_nprocs < 0) then FLExit("Target number of processes cannot be negative!") else if(input_nprocs > nprocs) then ewrite(-1, *) "The input number of processes must be equal or less than the number of processes currently running." FLExit("Running on insufficient processes.") else if(target_nprocs > nprocs) then ewrite(-1, *) "The target number of processes must be equal or less than the number of processes currently running." FLExit("Running on insufficient processes.") end if
it means target_nprocs > nprocs, but make run NPROCS=8
. nprocs should be 8,but why is it now 1, causing the program to fail to run successfully?
How can i solve this problem
The following content is the situation of fluidity.
Revision: : (debugging) Compile date: Oct 19 2023 14:28:11 OpenMP Support no Adaptivity support yes 2D adaptivity support yes 3D MBA support no CGAL support no MPI support yes Double precision yes NetCDF support yes Signal handling support yes Stream I/O support yes PETSc support yes Hypre support no ARPACK support no Python support yes Numpy support yes VTK support yes Zoltan support yes Memory diagnostics yes FEMDEM support no Hyperlight support no libsupermesh support no
mpiexec detected that one or more processes exited with non-zero status, thus causing the job to be terminated. The first process to do so was:
make: * [Makefile:10: run] Error 15
2、The content of "flredecomp.err-0": The target number of processes must be equal or less than the number of processes currently running. ** ERROR Source location: (Flredecomp.F90, 120) Error message: Running on insufficient processes. application called MPI_Abort(MPI_COMM_WORLD, 15) - process 0 [unset]: write_line error; fd=-1 buf=:cmd=abort exitcode=15 : system msg for write_line failure : Bad file descriptor 3、The content of "flredecomp.log-0": In flredecomp /home/wangbo/fluidity/lib/python3.8/site-packages/fluidity/state_types.py fluidity.state_types imported successfully; location: Input base name: backward_facing_step_3d Output base name: backward_facing_step_3d_flredecomp Input number of processes: 1 Target number of processes: 8 Job number of processes: 1 4、The source code mentioned in step2,which is the part of "Flredecomp.F90": ! Input check if(input_nprocs < 0) then FLExit("Input number of processes cannot be negative!") else if(target_nprocs < 0) then FLExit("Target number of processes cannot be negative!") else if(input_nprocs > nprocs) then ewrite(-1, ) "The input number of processes must be equal or less than the number of processes currently running." FLExit("Running on insufficient processes.") else if(target_nprocs > nprocs) then ewrite(-1, ) "The target number of processes must be equal or less than the number of processes currently running." FLExit("Running on insufficient processes.") end if**
it means target_nprocs > nprocs, but make run NPROCS=8. nprocs should be 8,but why is it now 1, causing the program to fail to run successfully? How can i solve this problem The following content is the situation of fluidity. Revision: : (debugging) Compile date: Oct 19 2023 14:28:11 OpenMP Support no Adaptivity support yes 2D adaptivity support yes 3D MBA support no CGAL support no MPI support yes Double precision yes NetCDF support yes Signal handling support yes Stream I/O support yes PETSc support yes Hypre support no ARPACK support no Python support yes Numpy support yes VTK support yes Zoltan support yes Memory diagnostics yes FEMDEM support no Hyperlight support no libsupermesh support no
I suspect that your fluidity has been built against a different MPI library than the one associated with the mpiexec
in your path. Can you give the output of:
ldd /home/wangbo/01_fluidity-main/bin/flredecomp
and
which mpiexec
and if on Ubuntu (or some other Debian derivative):
update-alternatives --display mpirun
Thanks for your reply.The following content is the output of the above three commands. 1、
linux-vdso.so.1 (0x00007ffec9fc1000)
libhdf5_openmpi.so.103 => /usr/lib/x86_64-linux-gnu/libhdf5_openmpi.so.103 (0x00007f0291bbc000)
libpetsc.so.3.9 => /home/wangbo/petsc-3.9.0/test/lib/libpetsc.so.3.9 (0x00007f028fe92000)
libparmetis.so => /home/wangbo/parmetis/lib/libparmetis.so (0x00007f028fe0e000)
libmetis.so => /home/wangbo/petsc-3.9.0/test/lib/libmetis.so (0x00007f028fd70000)
libm.so.6 => /usr/lib/x86_64-linux-gnu/libm.so.6 (0x00007f028fc21000)
libpthread.so.0 => /usr/lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f028fbfe000)
libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f028f9e4000)
libmpifort.so.0 => /home/wangbo/petsc-3.9.0/test/lib/libmpifort.so.0 (0x00007f028f92e000)
libmpi.so.0 => /home/wangbo/petsc-3.9.0/test/lib/libmpi.so.0 (0x00007f028f417000)
libgfortran.so.5 => /usr/lib/x86_64-linux-gnu/libgfortran.so.5 (0x00007f028f15b000)
libgcc_s.so.1 => /usr/lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f028f140000)
libnetcdf.so.15 => /usr/lib/x86_64-linux-gnu/libnetcdf.so.15 (0x00007f028f01b000)
libudunits2.so.0 => /usr/lib/x86_64-linux-gnu/libudunits2.so.0 (0x00007f028edfe000)
libpython3.8.so.1.0 => /usr/lib/x86_64-linux-gnu/libpython3.8.so.1.0 (0x00007f028e8a8000)
libvtkFiltersVerdict-7.1.so.7.1p => /usr/lib/x86_64-linux-gnu/libvtkFiltersVerdict-7.1.so.7.1p (0x00007f028e884000)
libmpi.so.40 => /usr/lib/x86_64-linux-gnu/libmpi.so.40 (0x00007f028e75d000)
libvtkIOParallelXML-7.1.so.7.1p => /usr/lib/x86_64-linux-gnu/libvtkIOParallelXML-7.1.so.7.1p (0x00007f028e723000)
libvtkParallelMPI-7.1.so.7.1p => /usr/lib/x86_64-linux-gnu/libvtkParallelMPI-7.1.so.7.1p (0x00007f028e6fa000)
libvtkIOLegacy-7.1.so.7.1p => /usr/lib/x86_64-linux-gnu/libvtkIOLegacy-7.1.so.7.1p (0x00007f028e630000)
libvtkIOXML-7.1.so.7.1p => /usr/lib/x86_64-linux-gnu/libvtkIOXML-7.1.so.7.1p (0x00007f028e525000)
libvtkIOCore-7.1.so.7.1p => /usr/lib/x86_64-linux-gnu/libvtkIOCore-7.1.so.7.1p (0x00007f028e4a8000)
libvtkCommonExecutionModel-7.1.so.7.1p => /usr/lib/x86_64-linux-gnu/libvtkCommonExecutionModel-7.1.so.7.1p (0x00007f028e3dd000)
libvtkCommonDataModel-7.1.so.7.1p => /usr/lib/x86_64-linux-gnu/libvtkCommonDataModel-7.1.so.7.1p (0x00007f028e005000)
libvtkCommonCore-7.1.so.7.1p => /usr/lib/x86_64-linux-gnu/libvtkCommonCore-7.1.so.7.1p (0x00007f028dcb9000)
libc.so.6 => /usr/lib/x86_64-linux-gnu/libc.so.6 (0x00007f028dac7000)
libsz.so.2 => /lib/x86_64-linux-gnu/libsz.so.2 (0x00007f028da8b000)
libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x00007f028da6f000)
libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f028da67000)
libX11.so.6 => /usr/lib/x86_64-linux-gnu/libX11.so.6 (0x00007f028d92a000)
/lib64/ld-linux-x86-64.so.2 (0x00007f0293e55000)
libOpenCL.so.1 => /usr/local/cuda-11.7/lib64/libOpenCL.so.1 (0x00007f028d722000)
libxml2.so.2 => /lib/x86_64-linux-gnu/libxml2.so.2 (0x00007f028d568000)
librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007f028d55e000)
libquadmath.so.0 => /lib/x86_64-linux-gnu/libquadmath.so.0 (0x00007f028d512000)
libhdf5_serial_hl.so.100 => /lib/x86_64-linux-gnu/libhdf5_serial_hl.so.100 (0x00007f028d4eb000)
libhdf5_serial.so.103 => /lib/x86_64-linux-gnu/libhdf5_serial.so.103 (0x00007f028d16e000)
libcurl-gnutls.so.4 => /lib/x86_64-linux-gnu/libcurl-gnutls.so.4 (0x00007f028d0de000)
libexpat.so.1 => /lib/x86_64-linux-gnu/libexpat.so.1 (0x00007f028d0b0000)
libutil.so.1 => /lib/x86_64-linux-gnu/libutil.so.1 (0x00007f028d0ab000)
libvtkverdict-7.1.so.7.1p => /lib/x86_64-linux-gnu/libvtkverdict-7.1.so.7.1p (0x00007f028d071000)
libopen-rte.so.40 => /lib/x86_64-linux-gnu/libopen-rte.so.40 (0x00007f028cfb7000)
libopen-pal.so.40 => /lib/x86_64-linux-gnu/libopen-pal.so.40 (0x00007f028cf09000)
libhwloc.so.15 => /lib/x86_64-linux-gnu/libhwloc.so.15 (0x00007f028ceb8000)
libvtkParallelCore-7.1.so.7.1p => /lib/x86_64-linux-gnu/libvtkParallelCore-7.1.so.7.1p (0x00007f028ce5e000)
libvtksys-7.1.so.7.1p => /lib/x86_64-linux-gnu/libvtksys-7.1.so.7.1p (0x00007f028ce10000)
libmpi_cxx.so.40 => /lib/x86_64-linux-gnu/libmpi_cxx.so.40 (0x00007f028cdf2000)
libvtkIOXMLParser-7.1.so.7.1p => /lib/x86_64-linux-gnu/libvtkIOXMLParser-7.1.so.7.1p (0x00007f028cdd2000)
libvtkCommonSystem-7.1.so.7.1p => /lib/x86_64-linux-gnu/libvtkCommonSystem-7.1.so.7.1p (0x00007f028cdba000)
libvtkCommonMisc-7.1.so.7.1p => /lib/x86_64-linux-gnu/libvtkCommonMisc-7.1.so.7.1p (0x00007f028cd9c000)
libvtkCommonTransforms-7.1.so.7.1p => /lib/x86_64-linux-gnu/libvtkCommonTransforms-7.1.so.7.1p (0x00007f028cd66000)
libvtkCommonMath-7.1.so.7.1p => /lib/x86_64-linux-gnu/libvtkCommonMath-7.1.so.7.1p (0x00007f028cd3f000)
libaec.so.0 => /lib/x86_64-linux-gnu/libaec.so.0 (0x00007f028cd36000)
libxcb.so.1 => /lib/x86_64-linux-gnu/libxcb.so.1 (0x00007f028cd0c000)
libicuuc.so.66 => /lib/x86_64-linux-gnu/libicuuc.so.66 (0x00007f028cb26000)
liblzma.so.5 => /lib/x86_64-linux-gnu/liblzma.so.5 (0x00007f028cafb000)
libnghttp2.so.14 => /lib/x86_64-linux-gnu/libnghttp2.so.14 (0x00007f028cad2000)
libidn2.so.0 => /lib/x86_64-linux-gnu/libidn2.so.0 (0x00007f028cab1000)
librtmp.so.1 => /lib/x86_64-linux-gnu/librtmp.so.1 (0x00007f028ca91000)
libssh.so.4 => /lib/x86_64-linux-gnu/libssh.so.4 (0x00007f028ca23000)
libpsl.so.5 => /lib/x86_64-linux-gnu/libpsl.so.5 (0x00007f028ca0e000)
libnettle.so.7 => /lib/x86_64-linux-gnu/libnettle.so.7 (0x00007f028c9d4000)
libgnutls.so.30 => /lib/x86_64-linux-gnu/libgnutls.so.30 (0x00007f028c7fe000)
libgssapi_krb5.so.2 => /lib/x86_64-linux-gnu/libgssapi_krb5.so.2 (0x00007f028c7b1000)
libldap_r-2.4.so.2 => /lib/x86_64-linux-gnu/libldap_r-2.4.so.2 (0x00007f028c75b000)
liblber-2.4.so.2 => /lib/x86_64-linux-gnu/liblber-2.4.so.2 (0x00007f028c74a000)
libbrotlidec.so.1 => /lib/x86_64-linux-gnu/libbrotlidec.so.1 (0x00007f028c73c000)
libevent-2.1.so.7 => /lib/x86_64-linux-gnu/libevent-2.1.so.7 (0x00007f028c6e4000)
libevent_pthreads-2.1.so.7 => /lib/x86_64-linux-gnu/libevent_pthreads-2.1.so.7 (0x00007f028c6df000)
libudev.so.1 => /lib/x86_64-linux-gnu/libudev.so.1 (0x00007f028c6b2000)
libltdl.so.7 => /lib/x86_64-linux-gnu/libltdl.so.7 (0x00007f028c6a7000)
libXau.so.6 => /lib/x86_64-linux-gnu/libXau.so.6 (0x00007f028c6a1000)
libXdmcp.so.6 => /lib/x86_64-linux-gnu/libXdmcp.so.6 (0x00007f028c697000)
libicudata.so.66 => /lib/x86_64-linux-gnu/libicudata.so.66 (0x00007f028abd6000)
libunistring.so.2 => /lib/x86_64-linux-gnu/libunistring.so.2 (0x00007f028aa54000)
libhogweed.so.5 => /lib/x86_64-linux-gnu/libhogweed.so.5 (0x00007f028aa1d000)
libgmp.so.10 => /lib/x86_64-linux-gnu/libgmp.so.10 (0x00007f028a999000)
libcrypto.so.1.1 => /lib/x86_64-linux-gnu/libcrypto.so.1.1 (0x00007f028a6c1000)
libp11-kit.so.0 => /lib/x86_64-linux-gnu/libp11-kit.so.0 (0x00007f028a58b000)
libtasn1.so.6 => /lib/x86_64-linux-gnu/libtasn1.so.6 (0x00007f028a575000)
libkrb5.so.3 => /lib/x86_64-linux-gnu/libkrb5.so.3 (0x00007f028a498000)
libk5crypto.so.3 => /lib/x86_64-linux-gnu/libk5crypto.so.3 (0x00007f028a467000)
libcom_err.so.2 => /lib/x86_64-linux-gnu/libcom_err.so.2 (0x00007f028a460000)
libkrb5support.so.0 => /lib/x86_64-linux-gnu/libkrb5support.so.0 (0x00007f028a44f000)
libresolv.so.2 => /lib/x86_64-linux-gnu/libresolv.so.2 (0x00007f028a433000)
libsasl2.so.2 => /lib/x86_64-linux-gnu/libsasl2.so.2 (0x00007f028a416000)
libgssapi.so.3 => /lib/x86_64-linux-gnu/libgssapi.so.3 (0x00007f028a3d1000)
libbrotlicommon.so.1 => /lib/x86_64-linux-gnu/libbrotlicommon.so.1 (0x00007f028a3ae000)
libbsd.so.0 => /lib/x86_64-linux-gnu/libbsd.so.0 (0x00007f028a392000)
libffi.so.7 => /lib/x86_64-linux-gnu/libffi.so.7 (0x00007f028a386000)
libkeyutils.so.1 => /lib/x86_64-linux-gnu/libkeyutils.so.1 (0x00007f028a37f000)
libheimntlm.so.0 => /lib/x86_64-linux-gnu/libheimntlm.so.0 (0x00007f028a373000)
libkrb5.so.26 => /lib/x86_64-linux-gnu/libkrb5.so.26 (0x00007f028a2e0000)
libasn1.so.8 => /lib/x86_64-linux-gnu/libasn1.so.8 (0x00007f028a238000)
libhcrypto.so.4 => /lib/x86_64-linux-gnu/libhcrypto.so.4 (0x00007f028a200000)
libroken.so.18 => /lib/x86_64-linux-gnu/libroken.so.18 (0x00007f028a1e7000)
libwind.so.0 => /lib/x86_64-linux-gnu/libwind.so.0 (0x00007f028a1bd000)
libheimbase.so.1 => /lib/x86_64-linux-gnu/libheimbase.so.1 (0x00007f028a1ab000)
libhx509.so.5 => /lib/x86_64-linux-gnu/libhx509.so.5 (0x00007f028a15b000)
libsqlite3.so.0 => /lib/x86_64-linux-gnu/libsqlite3.so.0 (0x00007f028a032000)
libcrypt.so.1 => /lib/x86_64-linux-gnu/libcrypt.so.1 (0x00007f0289ff7000)
2、
/usr/bin/mpiexec
3、
mpirun - auto mode
link best version is /usr/bin/mpirun.openmpi
link currently points to /usr/bin/mpirun.openmpi
link mpirun is /usr/bin/mpirun
slave mpiexec is /usr/bin/mpiexec
slave mpiexec.1.gz is /usr/share/man/man1/mpiexec.1.gz
slave mpirun.1.gz is /usr/share/man/man1/mpirun.1.gz
/usr/bin/mpirun.lam - priority 30
slave mpiexec: /usr/bin/mpiexec.lam
slave mpiexec.1.gz: /usr/share/man/man1/mpiexec.lam.1.gz
slave mpirun.1.gz: /usr/share/man/man1/mpirun.lam.1.gz
/usr/bin/mpirun.openmpi - priority 50
slave mpiexec: /usr/bin/mpiexec.openmpi
slave mpiexec.1.gz: /usr/share/man/man1/mpiexec.openmpi.1.gz
slave mpirun.1.gz: /usr/share/man/man1/mpirun.openmpi.1.gz
So it looks like you've build your own petsc, which in turn has build its own mpi - rather than using your system openmpi and thus your fluidity and flredecomp end up being linked to both. Are you using Ubuntu indeed? In which case you might be better off just using a system (apt) installed petsc. Otherwise either rebuild your petsc specifying that you want to use your system mpi (you may just to have to remove some download-mpi configure option), or otherwise rebuild fluidity using the mpi that was built by petsc by making sure all mpi wrappers in /home/wangbo/petsc-3.9.0/test/bin are in your PATH when configuring+building fluidity, and then also use the mpiexec from that directory.
Yes, I am using Ubuntu right now. And I have rebuilt fluidity using the mpi that was built by petsc by adding /home/wangbo/petsc-3.9.0/test/bin to my PATH. The output of which mpiexec
is /home/wangbo/petsc-3.9.0/test/bin/mpiexec
.
But when I run this example again, I got some other errors. The content displayed in flredecomp.err-0 is as follows:
Fatal error in MPI_Allreduce: Invalid datatype, error stack:
MPI_Allreduce(481): MPI_Allreduce(sbuf=0x7ffece2f5f18, rbuf=0x7ffece2f5f20, count=1, INVALID DATATYPE, op=0x46bba8a0, comm=0x84000004) failed
MPI_Allreduce(423): Invalid datatype