FluidityProject / fluidity

Fluidity
http://fluidity-project.org
Other
362 stars 113 forks source link

Errors when running the example **backward_facing_step_3d** #382

Open xiangbei007 opened 9 months ago

xiangbei007 commented 9 months ago

xiangbei007 commented 9 months ago

1、when running this example,I got errors:

`/bin/sh: 1: [: x: unexpected operator **Calling flredecomp in parallel with verbose log output enabled: mpiexec -n 8 /home/wangbo/01_fluidity-main/examples/backward_facing_step_3d/../../bin/flredecomp -i 1 -o 8 -v -l backward_facing_step_3d backward_facing_step_3d_flredecomp No protocol specified

Primary job terminated normally, but 1 process returned a non-zero exit code. Per user-direction, the job has been aborted.


mpiexec detected that one or more processes exited with non-zero status, thus causing the job to be terminated. The first process to do so was:

Process name: [[45886,1],0] Exit code: 15

make: *** [Makefile:10: run] Error 15`

2、The content of "flredecomp.err-0":

The target number of processes must be equal or less than the number of processes currently running. *** ERROR *** Source location: (Flredecomp.F90, 120) Error message: Running on insufficient processes. application called MPI_Abort(MPI_COMM_WORLD, 15) - process 0 [unset]: write_line error; fd=-1 buf=:cmd=abort exitcode=15 : system msg for write_line failure : Bad file descriptor 3、The content of "flredecomp.log-0": In flredecomp /home/wangbo/fluidity/lib/python3.8/site-packages/fluidity/state_types.py fluidity.state_types imported successfully; location: Input base name: backward_facing_step_3d Output base name: backward_facing_step_3d_flredecomp Input number of processes: 1 Target number of processes: 8 Job number of processes: 1 4、The source code mentioned in step2,which is the part of "Flredecomp.F90": ! Input check if(input_nprocs < 0) then FLExit("Input number of processes cannot be negative!") else if(target_nprocs < 0) then FLExit("Target number of processes cannot be negative!") else if(input_nprocs > nprocs) then ewrite(-1, *) "The input number of processes must be equal or less than the number of processes currently running." FLExit("Running on insufficient processes.") else if(target_nprocs > nprocs) then ewrite(-1, *) "The target number of processes must be equal or less than the number of processes currently running." FLExit("Running on insufficient processes.") end if

it means target_nprocs > nprocs, but make run NPROCS=8. nprocs should be 8,but why is it now 1, causing the program to fail to run successfully? How can i solve this problem The following content is the situation of fluidity. Revision: : (debugging) Compile date: Oct 19 2023 14:28:11 OpenMP Support no Adaptivity support yes 2D adaptivity support yes 3D MBA support no CGAL support no MPI support yes Double precision yes NetCDF support yes Signal handling support yes Stream I/O support yes PETSc support yes Hypre support no ARPACK support no Python support yes Numpy support yes VTK support yes Zoltan support yes Memory diagnostics yes FEMDEM support no Hyperlight support no libsupermesh support no

xiangbei007 commented 9 months ago

1、when running this example,I got errors: /bin/sh: 1: [: x: unexpected operator ****Calling flredecomp in parallel with verbose log output enabled: mpiexec -n 8 /home/wangbo/01_fluidity-main/examples/backward_facing_step_3d/../../bin/flredecomp -i 1 -o 8 -v -l backward_facing_step_3d backward_facing_step_3d_flredecomp No protocol specified

Primary job terminated normally, but 1 process returned a non-zero exit code. Per user-direction, the job has been aborted.


mpiexec detected that one or more processes exited with non-zero status, thus causing the job to be terminated. The first process to do so was:

Process name: [[45886,1],0] Exit code: 15

make: * [Makefile:10: run] Error 15

2、The content of "flredecomp.err-0": The target number of processes must be equal or less than the number of processes currently running. ** ERROR Source location: (Flredecomp.F90, 120) Error message: Running on insufficient processes. application called MPI_Abort(MPI_COMM_WORLD, 15) - process 0 [unset]: write_line error; fd=-1 buf=:cmd=abort exitcode=15 : system msg for write_line failure : Bad file descriptor 3、The content of "flredecomp.log-0": In flredecomp /home/wangbo/fluidity/lib/python3.8/site-packages/fluidity/state_types.py fluidity.state_types imported successfully; location: Input base name: backward_facing_step_3d Output base name: backward_facing_step_3d_flredecomp Input number of processes: 1 Target number of processes: 8 Job number of processes: 1 4、The source code mentioned in step2,which is the part of "Flredecomp.F90": ! Input check if(input_nprocs < 0) then FLExit("Input number of processes cannot be negative!") else if(target_nprocs < 0) then FLExit("Target number of processes cannot be negative!") else if(input_nprocs > nprocs) then ewrite(-1, ) "The input number of processes must be equal or less than the number of processes currently running." FLExit("Running on insufficient processes.") else if(target_nprocs > nprocs) then ewrite(-1, ) "The target number of processes must be equal or less than the number of processes currently running." FLExit("Running on insufficient processes.") end if**

it means target_nprocs > nprocs, but make run NPROCS=8. nprocs should be 8,but why is it now 1, causing the program to fail to run successfully? How can i solve this problem The following content is the situation of fluidity. Revision: : (debugging) Compile date: Oct 19 2023 14:28:11 OpenMP Support no Adaptivity support yes 2D adaptivity support yes 3D MBA support no CGAL support no MPI support yes Double precision yes NetCDF support yes Signal handling support yes Stream I/O support yes PETSc support yes Hypre support no ARPACK support no Python support yes Numpy support yes VTK support yes Zoltan support yes Memory diagnostics yes FEMDEM support no Hyperlight support no libsupermesh support no

stephankramer commented 9 months ago

I suspect that your fluidity has been built against a different MPI library than the one associated with the mpiexec in your path. Can you give the output of:

ldd /home/wangbo/01_fluidity-main/bin/flredecomp

and

which mpiexec

and if on Ubuntu (or some other Debian derivative):

update-alternatives --display mpirun
xiangbei007 commented 9 months ago

Thanks for your reply.The following content is the output of the above three commands. 1、

linux-vdso.so.1 (0x00007ffec9fc1000)
libhdf5_openmpi.so.103 => /usr/lib/x86_64-linux-gnu/libhdf5_openmpi.so.103 (0x00007f0291bbc000)
libpetsc.so.3.9 => /home/wangbo/petsc-3.9.0/test/lib/libpetsc.so.3.9 (0x00007f028fe92000)
libparmetis.so => /home/wangbo/parmetis/lib/libparmetis.so (0x00007f028fe0e000)
libmetis.so => /home/wangbo/petsc-3.9.0/test/lib/libmetis.so (0x00007f028fd70000)
libm.so.6 => /usr/lib/x86_64-linux-gnu/libm.so.6 (0x00007f028fc21000)
libpthread.so.0 => /usr/lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f028fbfe000)
libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f028f9e4000)
libmpifort.so.0 => /home/wangbo/petsc-3.9.0/test/lib/libmpifort.so.0 (0x00007f028f92e000)
libmpi.so.0 => /home/wangbo/petsc-3.9.0/test/lib/libmpi.so.0 (0x00007f028f417000)
libgfortran.so.5 => /usr/lib/x86_64-linux-gnu/libgfortran.so.5 (0x00007f028f15b000)
libgcc_s.so.1 => /usr/lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f028f140000)
libnetcdf.so.15 => /usr/lib/x86_64-linux-gnu/libnetcdf.so.15 (0x00007f028f01b000)
libudunits2.so.0 => /usr/lib/x86_64-linux-gnu/libudunits2.so.0 (0x00007f028edfe000)
libpython3.8.so.1.0 => /usr/lib/x86_64-linux-gnu/libpython3.8.so.1.0 (0x00007f028e8a8000)
libvtkFiltersVerdict-7.1.so.7.1p => /usr/lib/x86_64-linux-gnu/libvtkFiltersVerdict-7.1.so.7.1p (0x00007f028e884000)
libmpi.so.40 => /usr/lib/x86_64-linux-gnu/libmpi.so.40 (0x00007f028e75d000)
libvtkIOParallelXML-7.1.so.7.1p => /usr/lib/x86_64-linux-gnu/libvtkIOParallelXML-7.1.so.7.1p (0x00007f028e723000)
libvtkParallelMPI-7.1.so.7.1p => /usr/lib/x86_64-linux-gnu/libvtkParallelMPI-7.1.so.7.1p (0x00007f028e6fa000)
libvtkIOLegacy-7.1.so.7.1p => /usr/lib/x86_64-linux-gnu/libvtkIOLegacy-7.1.so.7.1p (0x00007f028e630000)
libvtkIOXML-7.1.so.7.1p => /usr/lib/x86_64-linux-gnu/libvtkIOXML-7.1.so.7.1p (0x00007f028e525000)
libvtkIOCore-7.1.so.7.1p => /usr/lib/x86_64-linux-gnu/libvtkIOCore-7.1.so.7.1p (0x00007f028e4a8000)
libvtkCommonExecutionModel-7.1.so.7.1p => /usr/lib/x86_64-linux-gnu/libvtkCommonExecutionModel-7.1.so.7.1p (0x00007f028e3dd000)
libvtkCommonDataModel-7.1.so.7.1p => /usr/lib/x86_64-linux-gnu/libvtkCommonDataModel-7.1.so.7.1p (0x00007f028e005000)
libvtkCommonCore-7.1.so.7.1p => /usr/lib/x86_64-linux-gnu/libvtkCommonCore-7.1.so.7.1p (0x00007f028dcb9000)
libc.so.6 => /usr/lib/x86_64-linux-gnu/libc.so.6 (0x00007f028dac7000)
libsz.so.2 => /lib/x86_64-linux-gnu/libsz.so.2 (0x00007f028da8b000)
libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x00007f028da6f000)
libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f028da67000)
libX11.so.6 => /usr/lib/x86_64-linux-gnu/libX11.so.6 (0x00007f028d92a000)
/lib64/ld-linux-x86-64.so.2 (0x00007f0293e55000)
libOpenCL.so.1 => /usr/local/cuda-11.7/lib64/libOpenCL.so.1 (0x00007f028d722000)
libxml2.so.2 => /lib/x86_64-linux-gnu/libxml2.so.2 (0x00007f028d568000)
librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007f028d55e000)
libquadmath.so.0 => /lib/x86_64-linux-gnu/libquadmath.so.0 (0x00007f028d512000)
libhdf5_serial_hl.so.100 => /lib/x86_64-linux-gnu/libhdf5_serial_hl.so.100 (0x00007f028d4eb000)
libhdf5_serial.so.103 => /lib/x86_64-linux-gnu/libhdf5_serial.so.103 (0x00007f028d16e000)
libcurl-gnutls.so.4 => /lib/x86_64-linux-gnu/libcurl-gnutls.so.4 (0x00007f028d0de000)
libexpat.so.1 => /lib/x86_64-linux-gnu/libexpat.so.1 (0x00007f028d0b0000)
libutil.so.1 => /lib/x86_64-linux-gnu/libutil.so.1 (0x00007f028d0ab000)
libvtkverdict-7.1.so.7.1p => /lib/x86_64-linux-gnu/libvtkverdict-7.1.so.7.1p (0x00007f028d071000)
libopen-rte.so.40 => /lib/x86_64-linux-gnu/libopen-rte.so.40 (0x00007f028cfb7000)
libopen-pal.so.40 => /lib/x86_64-linux-gnu/libopen-pal.so.40 (0x00007f028cf09000)
libhwloc.so.15 => /lib/x86_64-linux-gnu/libhwloc.so.15 (0x00007f028ceb8000)
libvtkParallelCore-7.1.so.7.1p => /lib/x86_64-linux-gnu/libvtkParallelCore-7.1.so.7.1p (0x00007f028ce5e000)
libvtksys-7.1.so.7.1p => /lib/x86_64-linux-gnu/libvtksys-7.1.so.7.1p (0x00007f028ce10000)
libmpi_cxx.so.40 => /lib/x86_64-linux-gnu/libmpi_cxx.so.40 (0x00007f028cdf2000)
libvtkIOXMLParser-7.1.so.7.1p => /lib/x86_64-linux-gnu/libvtkIOXMLParser-7.1.so.7.1p (0x00007f028cdd2000)
libvtkCommonSystem-7.1.so.7.1p => /lib/x86_64-linux-gnu/libvtkCommonSystem-7.1.so.7.1p (0x00007f028cdba000)
libvtkCommonMisc-7.1.so.7.1p => /lib/x86_64-linux-gnu/libvtkCommonMisc-7.1.so.7.1p (0x00007f028cd9c000)
libvtkCommonTransforms-7.1.so.7.1p => /lib/x86_64-linux-gnu/libvtkCommonTransforms-7.1.so.7.1p (0x00007f028cd66000)
libvtkCommonMath-7.1.so.7.1p => /lib/x86_64-linux-gnu/libvtkCommonMath-7.1.so.7.1p (0x00007f028cd3f000)
libaec.so.0 => /lib/x86_64-linux-gnu/libaec.so.0 (0x00007f028cd36000)
libxcb.so.1 => /lib/x86_64-linux-gnu/libxcb.so.1 (0x00007f028cd0c000)
libicuuc.so.66 => /lib/x86_64-linux-gnu/libicuuc.so.66 (0x00007f028cb26000)
liblzma.so.5 => /lib/x86_64-linux-gnu/liblzma.so.5 (0x00007f028cafb000)
libnghttp2.so.14 => /lib/x86_64-linux-gnu/libnghttp2.so.14 (0x00007f028cad2000)
libidn2.so.0 => /lib/x86_64-linux-gnu/libidn2.so.0 (0x00007f028cab1000)
librtmp.so.1 => /lib/x86_64-linux-gnu/librtmp.so.1 (0x00007f028ca91000)
libssh.so.4 => /lib/x86_64-linux-gnu/libssh.so.4 (0x00007f028ca23000)
libpsl.so.5 => /lib/x86_64-linux-gnu/libpsl.so.5 (0x00007f028ca0e000)
libnettle.so.7 => /lib/x86_64-linux-gnu/libnettle.so.7 (0x00007f028c9d4000)
libgnutls.so.30 => /lib/x86_64-linux-gnu/libgnutls.so.30 (0x00007f028c7fe000)
libgssapi_krb5.so.2 => /lib/x86_64-linux-gnu/libgssapi_krb5.so.2 (0x00007f028c7b1000)
libldap_r-2.4.so.2 => /lib/x86_64-linux-gnu/libldap_r-2.4.so.2 (0x00007f028c75b000)
liblber-2.4.so.2 => /lib/x86_64-linux-gnu/liblber-2.4.so.2 (0x00007f028c74a000)
libbrotlidec.so.1 => /lib/x86_64-linux-gnu/libbrotlidec.so.1 (0x00007f028c73c000)
libevent-2.1.so.7 => /lib/x86_64-linux-gnu/libevent-2.1.so.7 (0x00007f028c6e4000)
libevent_pthreads-2.1.so.7 => /lib/x86_64-linux-gnu/libevent_pthreads-2.1.so.7 (0x00007f028c6df000)
libudev.so.1 => /lib/x86_64-linux-gnu/libudev.so.1 (0x00007f028c6b2000)
libltdl.so.7 => /lib/x86_64-linux-gnu/libltdl.so.7 (0x00007f028c6a7000)
libXau.so.6 => /lib/x86_64-linux-gnu/libXau.so.6 (0x00007f028c6a1000)
libXdmcp.so.6 => /lib/x86_64-linux-gnu/libXdmcp.so.6 (0x00007f028c697000)
libicudata.so.66 => /lib/x86_64-linux-gnu/libicudata.so.66 (0x00007f028abd6000)
libunistring.so.2 => /lib/x86_64-linux-gnu/libunistring.so.2 (0x00007f028aa54000)
libhogweed.so.5 => /lib/x86_64-linux-gnu/libhogweed.so.5 (0x00007f028aa1d000)
libgmp.so.10 => /lib/x86_64-linux-gnu/libgmp.so.10 (0x00007f028a999000)
libcrypto.so.1.1 => /lib/x86_64-linux-gnu/libcrypto.so.1.1 (0x00007f028a6c1000)
libp11-kit.so.0 => /lib/x86_64-linux-gnu/libp11-kit.so.0 (0x00007f028a58b000)
libtasn1.so.6 => /lib/x86_64-linux-gnu/libtasn1.so.6 (0x00007f028a575000)
libkrb5.so.3 => /lib/x86_64-linux-gnu/libkrb5.so.3 (0x00007f028a498000)
libk5crypto.so.3 => /lib/x86_64-linux-gnu/libk5crypto.so.3 (0x00007f028a467000)
libcom_err.so.2 => /lib/x86_64-linux-gnu/libcom_err.so.2 (0x00007f028a460000)
libkrb5support.so.0 => /lib/x86_64-linux-gnu/libkrb5support.so.0 (0x00007f028a44f000)
libresolv.so.2 => /lib/x86_64-linux-gnu/libresolv.so.2 (0x00007f028a433000)
libsasl2.so.2 => /lib/x86_64-linux-gnu/libsasl2.so.2 (0x00007f028a416000)
libgssapi.so.3 => /lib/x86_64-linux-gnu/libgssapi.so.3 (0x00007f028a3d1000)
libbrotlicommon.so.1 => /lib/x86_64-linux-gnu/libbrotlicommon.so.1 (0x00007f028a3ae000)
libbsd.so.0 => /lib/x86_64-linux-gnu/libbsd.so.0 (0x00007f028a392000)
libffi.so.7 => /lib/x86_64-linux-gnu/libffi.so.7 (0x00007f028a386000)
libkeyutils.so.1 => /lib/x86_64-linux-gnu/libkeyutils.so.1 (0x00007f028a37f000)
libheimntlm.so.0 => /lib/x86_64-linux-gnu/libheimntlm.so.0 (0x00007f028a373000)
libkrb5.so.26 => /lib/x86_64-linux-gnu/libkrb5.so.26 (0x00007f028a2e0000)
libasn1.so.8 => /lib/x86_64-linux-gnu/libasn1.so.8 (0x00007f028a238000)
libhcrypto.so.4 => /lib/x86_64-linux-gnu/libhcrypto.so.4 (0x00007f028a200000)
libroken.so.18 => /lib/x86_64-linux-gnu/libroken.so.18 (0x00007f028a1e7000)
libwind.so.0 => /lib/x86_64-linux-gnu/libwind.so.0 (0x00007f028a1bd000)
libheimbase.so.1 => /lib/x86_64-linux-gnu/libheimbase.so.1 (0x00007f028a1ab000)
libhx509.so.5 => /lib/x86_64-linux-gnu/libhx509.so.5 (0x00007f028a15b000)
libsqlite3.so.0 => /lib/x86_64-linux-gnu/libsqlite3.so.0 (0x00007f028a032000)
libcrypt.so.1 => /lib/x86_64-linux-gnu/libcrypt.so.1 (0x00007f0289ff7000)

2、 /usr/bin/mpiexec 3、

mpirun - auto mode
     link best version is /usr/bin/mpirun.openmpi
     link currently points to /usr/bin/mpirun.openmpi
     link mpirun is /usr/bin/mpirun
     slave mpiexec is /usr/bin/mpiexec
     slave mpiexec.1.gz is /usr/share/man/man1/mpiexec.1.gz
     slave mpirun.1.gz is /usr/share/man/man1/mpirun.1.gz
/usr/bin/mpirun.lam - priority 30
     slave mpiexec: /usr/bin/mpiexec.lam
     slave mpiexec.1.gz: /usr/share/man/man1/mpiexec.lam.1.gz
     slave mpirun.1.gz: /usr/share/man/man1/mpirun.lam.1.gz
/usr/bin/mpirun.openmpi - priority 50
     slave mpiexec: /usr/bin/mpiexec.openmpi
     slave mpiexec.1.gz: /usr/share/man/man1/mpiexec.openmpi.1.gz
     slave mpirun.1.gz: /usr/share/man/man1/mpirun.openmpi.1.gz
stephankramer commented 9 months ago

So it looks like you've build your own petsc, which in turn has build its own mpi - rather than using your system openmpi and thus your fluidity and flredecomp end up being linked to both. Are you using Ubuntu indeed? In which case you might be better off just using a system (apt) installed petsc. Otherwise either rebuild your petsc specifying that you want to use your system mpi (you may just to have to remove some download-mpi configure option), or otherwise rebuild fluidity using the mpi that was built by petsc by making sure all mpi wrappers in /home/wangbo/petsc-3.9.0/test/bin are in your PATH when configuring+building fluidity, and then also use the mpiexec from that directory.

xiangbei007 commented 9 months ago

Yes, I am using Ubuntu right now. And I have rebuilt fluidity using the mpi that was built by petsc by adding /home/wangbo/petsc-3.9.0/test/bin to my PATH. The output of which mpiexec is /home/wangbo/petsc-3.9.0/test/bin/mpiexec. But when I run this example again, I got some other errors. The content displayed in flredecomp.err-0 is as follows:

Fatal error in MPI_Allreduce: Invalid datatype, error stack:
MPI_Allreduce(481): MPI_Allreduce(sbuf=0x7ffece2f5f18, rbuf=0x7ffece2f5f20, count=1, INVALID DATATYPE, op=0x46bba8a0, comm=0x84000004) failed
MPI_Allreduce(423): Invalid datatype