ForestClaw / forestclaw

Quadtree/octree adaptive PDE solver based based on p4est.
http://www.forestclaw.org
BSD 2-Clause "Simplified" License
58 stars 21 forks source link

VTK and OpenMPI #144

Closed donnaaboise closed 1 year ago

donnaaboise commented 5 years ago

It seems that there is a problem with the VTK output and the latest version of OpenMPI. Version 2.1 works, but version 3.1 (from Nov. 2018) does not.

cburstedde commented 5 years ago

What kind of problem is this? Any error messages/ways to reproduce?

hommedesbois commented 4 years ago

Hello, this issue seems to persist. I am using OpenMPI version 4 and I get an error message. With openmpi-4* loaded it should be possible to reproduce the problem running following command in the forestclaw/applications/clawpack/advection/2d/swirl directory

$FCLAW_BIN/swirl --user:claw-version=4 --clawpack46:vtk-out=T

Any ideas?

Thanks in advance!

donnaaboise commented 4 years ago

Thanks for following up on this.

I generally use MPICH when it is available, in part to avoid problems with openMPI. That said, it would be good to know what problem you are running into.

Can you post the following ?

hommedesbois commented 4 years ago

I have openmpi-4.0.0 and openmpi-4.0.3 available, which I both tried. Please find attached the log file with the error message as well as the git log. git.log swirl.log

hommedesbois commented 4 years ago

The version that runs on my machine is actually v3.1.3 from October 2018. Sorry for the confusion.

donnaaboise commented 3 years ago

I was able to more or less reproduce your error using openmpi3/gcc/3.1.3. Other versions of OpenMPI also gave strange errors.

(r2) ~/.../advection/2d/swirl (develop) % mpirun -n 1 swirl --user:claw-version=4 --clawpack46:vtk-out=T
[r2:12364] mca_base_component_repository_open: unable to open mca_pml_ucx: libibcm.so.1: cannot open shared object file: No such file or directory (ignored)
[r2:12364] mca_base_component_repository_open: unable to open mca_osc_ucx: libibcm.so.1: cannot open shared object file: No such file or directory (ignored)
[libsc] This is libsc 2.1.47-eaa88
[p4est] This is p4est 2.0.94-00da
[fclaw] This is ForestClaw 0.1.4928-8180
[fclaw] CPP                      mpicc -E
[fclaw] CPPFLAGS                 
[fclaw] F77                      gfortran
[fclaw] FFLAGS                   -O2 -cpp
[fclaw] CC                       mpicc
[fclaw] CFLAGS                   -O2 -Wall -std=c99 -pedantic
[fclaw] CXX                      mpicxx
[fclaw] CXXFLAGS                 -O2 -Wall
[fclaw] LDFLAGS                  
[fclaw] FLIBS                     -L/cm/local/apps/cuda/libs/current/lib64/../lib64 -L/cm/shared/apps/slurm/17.11.12/lib64/../lib64 -L/usr/lib/gcc/x86_64-redhat-linux/4.8.5 -L/usr/lib/gcc/x86_64-redhat-linux/4.8.5/../../../../lib64 -L/lib/../lib64 -L/usr/lib/../lib64 -L/cm/local/apps/cuda/libs/current/lib64 -L/cm/shared/apps/cuda10.0/toolkit/10.0.130/targets/x86_64-linux/lib -L/cm/shared/apps/slurm/17.11.12/lib64/slurm -L/cm/shared/apps/slurm/17.11.12/lib64 -L/usr/lib/gcc/x86_64-redhat-linux/4.8.5/../../.. -lgfortran -lm -lquadmath
[fclaw] LIBS                       -lz -lm   
[fclaw] Options:
[fclaw]    help                                 false
[fclaw]    version                              false
[fclaw]    print-options                        false
[fclaw]    verbosity                            essential
[fclaw]    lib-verbosity                        essential
[fclaw]    initial_dt                           0.005
[fclaw]    max_cfl                              1
[fclaw]    desired_cfl                          0.9
[fclaw]    reduce-cfl                           true
[fclaw]    use_fixed_dt                         false
[fclaw]    outstyle                             1
[fclaw]    tfinal                               4
[fclaw]    nout                                 16
[fclaw]    nstep                                1
[fclaw]    advance-one-step                     true
[fclaw]    outstyle-uses-maxlevel               true
[fclaw]    subcycle                             true
[fclaw]    weighted_partition                   false
[fclaw]    time-sync                            false
[fclaw]    flux-correction                      true
[fclaw]    fluctuation-correction               true
[fclaw]    output                               true
[fclaw]    output-gauges                        false
[fclaw]    gauge-buffer-length                  1
[fclaw]    tikz-out                             false
[fclaw]    tikz-figsize                         4 4
[fclaw]    tikz-plot-prefix                     plot
[fclaw]    tikz-plot-suffix                     png
[fclaw]    tikz-mesh-only                       false
[fclaw]    tikz-plot-fig                        true
[fclaw]    prefix                               fort
[fclaw]    vtkspace                             0
[fclaw]    init_ghostcell                       false
[fclaw]    minlevel                             2
[fclaw]    maxlevel                             6
[fclaw]    regrid_interval                      1
[fclaw]    refratio                             2
[fclaw]    smooth-refine                        true
[fclaw]    smooth-level                         0
[fclaw]    coarsen-delay                        0
[fclaw]    refine_threshold                     0.25
[fclaw]    coarsen_threshold                    0.05
[fclaw]    run-user-diagnostics                 false
[fclaw]    compute-error                        false
[fclaw]    conservation-check                   false
[fclaw]    report-timing                        true
[fclaw]    report-timing-verbosity              summary
[fclaw]    ghost_patch_pack_area                true
[fclaw]    ghost_patch_pack_extra               false
[fclaw]    ghost_patch_pack_numextrafields      0
[fclaw]    trapfpe                              true
[fclaw]    mpi_debug                            false
[fclaw]    ax                                   0
[fclaw]    bx                                   1
[fclaw]    ay                                   0
[fclaw]    by                                   1
[fclaw]    manifold                             false
[fclaw]    mi                                   1
[fclaw]    mj                                   1
[fclaw]    periodic_x                           false
[fclaw]    periodic_y                           false
[fclaw]    scale                                1 1 1
[fclaw]    shift                                0 0 0
[fclaw]    phi                                  0
[fclaw]    theta                                0
[fclaw]    clawpatch:mx                         8
[fclaw]    clawpatch:my                         8
[fclaw]    clawpatch:maux                       3
[fclaw]    clawpatch:mbc                        2
[fclaw]    clawpatch:meqn                       1
[fclaw]    clawpatch:interp_stencil_width       3
[fclaw]    clawpatch:ghost_patch_pack_aux       true
[fclaw]    clawpack46:order                     2 2
[fclaw]    clawpack46:mcapa                     0
[fclaw]    clawpack46:src_term                  false
[fclaw]    clawpack46:use-fwaves                false
[fclaw]    clawpack46:mwaves                    1
[fclaw]    clawpack46:mthlim                    3
[fclaw]    clawpack46:mthbc                     1 1 1 1
[fclaw]    clawpack46:ascii-out                 true
[fclaw]    clawpack46:vtk-out                   true
[fclaw]    clawpack5:order                      2 2
[fclaw]    clawpack5:mcapa                      0
[fclaw]    clawpack5:src_term                   false
[fclaw]    clawpack5:use_fwaves                 false
[fclaw]    clawpack5:mwaves                     1
[fclaw]    clawpack5:mthlim                     3
[fclaw]    clawpack5:mthbc                      1 1 1 1
[fclaw]    clawpack5:ascii-out                  true
[fclaw]    clawpack5:vtk-out                    false
[fclaw]    user:period                          4
[fclaw]    user:claw-version                    4
[fclaw] Arguments: none
[p4est 0] Local minimum/maximum levels:  2  2
[p4est] Global minimum/maximum levels:  2  2
[p4est 0] Patches on level  2:        16
[fclaw] Max threads set to 0
[fclaw] Output Frame    0  at time   0.00000000e+00

Improper use of FILE Mode, Using WRONLY for Read!
[libsc 0] Abort: MPI error
[libsc 0] Abort: ../forestclaw/src/patches/clawpatch/fclaw2d_clawpatch_output_vtk.c:424
[libsc 0] Abort: Obtained 10 stack frames
[libsc 0] Stack 0: swirl() [0x46497c]
[libsc 0] Stack 1: swirl() [0x463bfa]
[libsc 0] Stack 2: swirl() [0x4640dc]
[libsc 0] Stack 3: swirl() [0x41833f]
[libsc 0] Stack 4: swirl() [0x41843c]
[libsc 0] Stack 5: swirl() [0x4354fd]
[libsc 0] Stack 6: swirl() [0x42e5d4]
[libsc 0] Stack 7: swirl() [0x403b85]
[libsc 0] Stack 8: libc.so.6(__libc_start_main+0xf5) [0x7ffff55b4505]
[libsc 0] Stack 9: swirl() [0x403bd2]
--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD
with errorcode 1.

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------

As a possible work-around, do you have access to MPICH? I just recompiled and ran using MPICH mpich/ge/gcc/64/3.2.1 and everything worked fine. That could of course just mean that MPICH is more forgiving of subtle errors, so it would still be good to figure out what the problem with OpenMPI is.

cburstedde commented 3 years ago

Hello,

I was able to more or less reproduce your error using openmpi3/gcc/3.1.3. Other versions of OpenMPI also gave strange errors.

my personal experience over the years has lead me to discourage use of OpenMPI altogether. Ymmv. Does make check in the p4est installation work ok? It tests the MPI I/O load/save mechanism we're using.

(r2) ~/.../advection/2d/swirl (develop) % mpirun -n 1 swirl --user:claw-version=4 --clawpack46:vtk-out=T
[r2:12364] mca_base_component_repository_open: unable to open mca_pml_ucx: libibcm.so.1: cannot open shared object file: No such file or directory (ignored)
[r2:12364] mca_base_component_repository_open: unable to open mca_osc_ucx: libibcm.so.1: cannot open shared object file: No such file or directory (ignored)
[libsc] This is libsc 2.1.47-eaa88
[p4est] This is p4est 2.0.94-00da
[fclaw] This is ForestClaw 0.1.4928-8180
[fclaw] CPP                      mpicc -E
[fclaw] CPPFLAGS                 
[fclaw] F77                      gfortran
[fclaw] FFLAGS                   -O2 -cpp
[fclaw] CC                       mpicc
[fclaw] CFLAGS                   -O2 -Wall -std=c99 -pedantic
[fclaw] CXX                      mpicxx
[fclaw] CXXFLAGS                 -O2 -Wall
[fclaw] LDFLAGS                  
[fclaw] FLIBS                     -L/cm/local/apps/cuda/libs/current/lib64/../lib64 -L/cm/shared/apps/slurm/17.11.12/lib64/../lib64 -L/usr/lib/gcc/x86_64-redhat-linux/4.8.5 -L/usr/lib/gcc/x86_64-redhat-linux/4.8.5/../../../../lib64 -L/lib/../lib64 -L/usr/lib/../lib64 -L/cm/local/apps/cuda/libs/current/lib64 -L/cm/shared/apps/cuda10.0/toolkit/10.0.130/targets/x86_64-linux/lib -L/cm/shared/apps/slurm/17.11.12/lib64/slurm -L/cm/shared/apps/slurm/17.11.12/lib64 -L/usr/lib/gcc/x86_64-redhat-linux/4.8.5/../../.. -lgfortran -lm -lquadmath
[fclaw] LIBS                       -lz -lm   
[fclaw] Options:
[fclaw]    help                                 false
[fclaw]    version                              false
[fclaw]    print-options                        false
[fclaw]    verbosity                            essential
[fclaw]    lib-verbosity                        essential
[fclaw]    initial_dt                           0.005
[fclaw]    max_cfl                              1
[fclaw]    desired_cfl                          0.9
[fclaw]    reduce-cfl                           true
[fclaw]    use_fixed_dt                         false
[fclaw]    outstyle                             1
[fclaw]    tfinal                               4
[fclaw]    nout                                 16
[fclaw]    nstep                                1
[fclaw]    advance-one-step                     true
[fclaw]    outstyle-uses-maxlevel               true
[fclaw]    subcycle                             true
[fclaw]    weighted_partition                   false
[fclaw]    time-sync                            false
[fclaw]    flux-correction                      true
[fclaw]    fluctuation-correction               true
[fclaw]    output                               true
[fclaw]    output-gauges                        false
[fclaw]    gauge-buffer-length                  1
[fclaw]    tikz-out                             false
[fclaw]    tikz-figsize                         4 4
[fclaw]    tikz-plot-prefix                     plot
[fclaw]    tikz-plot-suffix                     png
[fclaw]    tikz-mesh-only                       false
[fclaw]    tikz-plot-fig                        true
[fclaw]    prefix                               fort
[fclaw]    vtkspace                             0
[fclaw]    init_ghostcell                       false
[fclaw]    minlevel                             2
[fclaw]    maxlevel                             6
[fclaw]    regrid_interval                      1
[fclaw]    refratio                             2
[fclaw]    smooth-refine                        true
[fclaw]    smooth-level                         0
[fclaw]    coarsen-delay                        0
[fclaw]    refine_threshold                     0.25
[fclaw]    coarsen_threshold                    0.05
[fclaw]    run-user-diagnostics                 false
[fclaw]    compute-error                        false
[fclaw]    conservation-check                   false
[fclaw]    report-timing                        true
[fclaw]    report-timing-verbosity              summary
[fclaw]    ghost_patch_pack_area                true
[fclaw]    ghost_patch_pack_extra               false
[fclaw]    ghost_patch_pack_numextrafields      0
[fclaw]    trapfpe                              true
[fclaw]    mpi_debug                            false
[fclaw]    ax                                   0
[fclaw]    bx                                   1
[fclaw]    ay                                   0
[fclaw]    by                                   1
[fclaw]    manifold                             false
[fclaw]    mi                                   1
[fclaw]    mj                                   1
[fclaw]    periodic_x                           false
[fclaw]    periodic_y                           false
[fclaw]    scale                                1 1 1
[fclaw]    shift                                0 0 0
[fclaw]    phi                                  0
[fclaw]    theta                                0
[fclaw]    clawpatch:mx                         8
[fclaw]    clawpatch:my                         8
[fclaw]    clawpatch:maux                       3
[fclaw]    clawpatch:mbc                        2
[fclaw]    clawpatch:meqn                       1
[fclaw]    clawpatch:interp_stencil_width       3
[fclaw]    clawpatch:ghost_patch_pack_aux       true
[fclaw]    clawpack46:order                     2 2
[fclaw]    clawpack46:mcapa                     0
[fclaw]    clawpack46:src_term                  false
[fclaw]    clawpack46:use-fwaves                false
[fclaw]    clawpack46:mwaves                    1
[fclaw]    clawpack46:mthlim                    3
[fclaw]    clawpack46:mthbc                     1 1 1 1
[fclaw]    clawpack46:ascii-out                 true
[fclaw]    clawpack46:vtk-out                   true
[fclaw]    clawpack5:order                      2 2
[fclaw]    clawpack5:mcapa                      0
[fclaw]    clawpack5:src_term                   false
[fclaw]    clawpack5:use_fwaves                 false
[fclaw]    clawpack5:mwaves                     1
[fclaw]    clawpack5:mthlim                     3
[fclaw]    clawpack5:mthbc                      1 1 1 1
[fclaw]    clawpack5:ascii-out                  true
[fclaw]    clawpack5:vtk-out                    false
[fclaw]    user:period                          4
[fclaw]    user:claw-version                    4
[fclaw] Arguments: none
[p4est 0] Local minimum/maximum levels:  2  2
[p4est] Global minimum/maximum levels:  2  2
[p4est 0] Patches on level  2:        16
[fclaw] Max threads set to 0
[fclaw] Output Frame    0  at time   0.00000000e+00

Improper use of FILE Mode, Using WRONLY for Read!
[libsc 0] Abort: MPI error
[libsc 0] Abort: ../forestclaw/src/patches/clawpatch/fclaw2d_clawpatch_output_vtk.c:424
[libsc 0] Abort: Obtained 10 stack frames
[libsc 0] Stack 0: swirl() [0x46497c]
[libsc 0] Stack 1: swirl() [0x463bfa]
[libsc 0] Stack 2: swirl() [0x4640dc]
[libsc 0] Stack 3: swirl() [0x41833f]
[libsc 0] Stack 4: swirl() [0x41843c]
[libsc 0] Stack 5: swirl() [0x4354fd]
[libsc 0] Stack 6: swirl() [0x42e5d4]
[libsc 0] Stack 7: swirl() [0x403b85]
[libsc 0] Stack 8: libc.so.6(__libc_start_main+0xf5) [0x7ffff55b4505]
[libsc 0] Stack 9: swirl() [0x403bd2]
--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD
with errorcode 1.

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------

As a possible work-around, do you have access to MPICH? I just recompiled and ran using MPICH mpich/ge/gcc/64/3.2.1 and everything worked fine. That could of course just mean that MPICH is more forgiving of subtle errors, so it would still be good to figure out what the problem with OpenMPI is.

-- You are receiving this because you were assigned. Reply to this email directly or view it on GitHub: https://github.com/ForestClaw/forestclaw/issues/144#issuecomment-680049645

hommedesbois commented 3 years ago

Thanks for looking into this. I compiled with mpich and it works just fine. Make check with OpenMPI does not indicate any problems though.

donnaaboise commented 3 years ago

I doubt there is a problem with p4est. Rather the problem is more likely with how forestclass is handling the VTK. We should keep this on the list of things to look at at some point, but may be not the highest priority.

@hommedesbois is MPICH a good work-around for you?

hommedesbois commented 3 years ago

@donnaaboise yes, definitely.

cburstedde commented 3 years ago

I doubt there is a problem with p4est. Rather the problem is more likely with how forestclass is handling the VTK. We should keep this on the list of things to look at at some point, but may be not the highest priority.

The forestclaw VTK output and p4est_save use very similar MPI I/O code. I'd suspect one to be an indicator for the other.

@hommedesbois is MPICH a good work-around for you?

For me, it usually is.

donnaaboise commented 1 year ago

Solution seems to be to avoid using OpenMPI.