SeisSol / PUMGen

Mesh generation for SeisSol
BSD 3-Clause "New" or "Revised" License
7 stars 3 forks source link

Converting mesh to APF fails for large mesh #60

Closed Thomas-Ulrich closed 6 months ago

Thomas-Ulrich commented 1 year ago

I'm trying to generate a larger mesh for the texascale. The mesh (750M cells) is created by pumgen, but there is then a bug with APF. Might be a memory overflow, but we have 999Gb of RAM on the server (exception, and I was the only one using at the time).

I'm using the mesh64 branch, but this is not a new bug of this branch as I experienced it before. (pumgen compiled with spack, with: spack install pumgen@mesh64 +with_simmetrix ^pumi@2.2.8 ^simmetrix-simmodsuite@2023.0-230923 ^easi jit=impalajit,lua (and the package modified so that it know the mesh64 version).

Sat Oct 07 13:09:41, Info:  No filtering enabled (contiguous storage)
Sat Oct 07 13:09:41, Info:  Using SimModSuite
Sat Oct 07 13:09:41, Info:  Loading model
Sat Oct 07 13:11:58, Info:  Extracting cases
Sat Oct 07 13:11:58, Info:  surface smoothing option: surfaceSmoothingLevel surfaceSmoothingType surfaceFaceRotationLimit Snap 1   1   5   0
Sat Oct 07 13:11:58, Info:  volume smoothing option: volumeSmoothingLevel volumeSmoothingType 1   1
Sat Oct 07 13:11:58, Info:  Activating velocity aware meshing, using 2 elements per wavelength and easi file Mw_78_Turkey_rhomulambda1D_Guvercin_et_al.yaml
Sat Oct 07 13:11:58, Info:  Adding velocity aware refinement region targeting 6 Hz, centered at x = 20000 y= 50000 z= -10000 with half sizes x = 200000 y = 100000 z = 15000
Sat Oct 07 13:11:58, Info:  rotated around z axis by  45 degree(s) counterclockwise from x axis.
Sat Oct 07 13:11:58, Info:  bypass findRegion and use group = 1
Sat Oct 07 13:11:58, Info:  Adding velocity aware refinement region targeting 0.25 Hz, centered at x = 0 y= 0 z= -15000 with half sizes x = 4e+11 y = 1.5e+11 z = 1.5e+11
Sat Oct 07 13:11:58, Info:  bypass findRegion and use group = 1
Sat Oct 07 13:11:58, Info:  Setting cases
Sat Oct 07 13:11:58, Info:  faceBound[ 96 ] = 1
Sat Oct 07 13:11:58, Info:  faceBound[ 105 ] = 1
Sat Oct 07 13:11:58, Info:  faceBound[ 95 ] = 5
Sat Oct 07 13:11:58, Info:  faceBound[ 98 ] = 5
Sat Oct 07 13:11:58, Info:  faceBound[ 107 ] = 5
Sat Oct 07 13:11:58, Info:  faceBound[ 110 ] = 5
Sat Oct 07 13:11:58, Info:  faceBound[ 119 ] = 5
Sat Oct 07 13:11:58, Info:  faceBound[ 100 ] = 65
Sat Oct 07 13:11:58, Info:  faceBound[ 93 ] = 66
Sat Oct 07 13:11:58, Info:  faceBound[ 94 ] = 66
Sat Oct 07 13:11:58, Info:  faceBound[ 108 ] = 66
Sat Oct 07 13:11:58, Info:  faceBound[ 116 ] = 66
Sat Oct 07 13:11:58, Info:  faceBound[ 121 ] = 66
Sat Oct 07 13:11:58, Info:  faceBound[ 102 ] = 67
Sat Oct 07 13:11:58, Info:  faceBound[ 109 ] = 67
Sat Oct 07 13:11:58, Info:  faceBound[ 117 ] = 67
Sat Oct 07 13:11:58, Info:  faceBound[ 111 ] = 68
Sat Oct 07 13:11:58, Info:  faceBound[ 94 ] = 68
Sat Oct 07 13:11:58, Info:  faceBound[ 112 ] = 69
Sat Oct 07 13:11:58, Info:  globalMSize = 5000
Sat Oct 07 13:11:58, Info:  face id: 100 , MSize = 200
Sat Oct 07 13:11:58, Info:  face id: 93 , MSize = 200
Sat Oct 07 13:11:58, Info:  face id: 94 , MSize = 200
Sat Oct 07 13:11:58, Info:  face id: 108 , MSize = 200
Sat Oct 07 13:11:58, Info:  face id: 116 , MSize = 200
Sat Oct 07 13:11:58, Info:  face id: 121 , MSize = 200
Sat Oct 07 13:11:58, Info:  face id: 102 , MSize = 200
Sat Oct 07 13:11:58, Info:  face id: 109 , MSize = 200
Sat Oct 07 13:11:58, Info:  face id: 117 , MSize = 200
Sat Oct 07 13:11:58, Info:  face id: 111 , MSize = 200
Sat Oct 07 13:11:58, Info:  face id: 94 , MSize = 200
Sat Oct 07 13:11:58, Info:  face id: 112 , MSize = 200
Sat Oct 07 13:11:58, Info:  face id: 105 , MSize = 900
Sat Oct 07 13:11:58, Info:  Enabling velocity aware meshing
Sat Oct 07 13:11:58, Info:  Target equivolume AspectRatio = 12
Sat Oct 07 13:11:58, Info:  Target equiarea AspectRatio = 6
Sat Oct 07 13:11:58, Info:  Starting the surface mesher
Sat Oct 07 13:11:58, Info:  Progress: Surface Meshing
Sat Oct 07 13:29:02, Info:  Progress: Adapting Mesh
Sat Oct 07 13:30:04, Info:  Progress: Surface Mesh Improver
Sat Oct 07 13:30:04, Info:  Progress: Surface Mesh Improvement , 0 / 2
Sat Oct 07 13:30:04, Info:  Progress: Surface Mesh Improvement , 1 / 2
Sat Oct 07 13:30:04, Info:  Progress: Surface Mesh Improvement , 2 / 2
Sat Oct 07 13:30:04, Info:  Progress: Surface Mesh Improvement , done
Sat Oct 07 13:30:04, Info:  Progress: Surface Smoothing , 0 / 100
Sat Oct 07 13:30:04, Info:  Progress: Surface Smoothing , 16 / 100
Sat Oct 07 13:30:04, Info:  Progress: Surface Smoothing , 32 / 100
Sat Oct 07 13:30:04, Info:  Progress: Surface Smoothing , done
Sat Oct 07 13:30:04, Info:  Progress: Fix Surface Intersections
Sat Oct 07 13:30:15, Info:  Starting the volume mesher
Sat Oct 07 13:30:15, Info:  Progress: Volume Meshing
Sat Oct 07 13:30:15, Info:  Progress: Creating volume mesh
Sat Oct 07 13:49:58, Info:  Progress: Adapting Mesh
Sat Oct 07 13:50:09, Info:  Progress: Volume Mesh Improver
Sat Oct 07 13:50:09, Info:  Progress: Volume Optimization
Sat Oct 07 13:50:30, Info:  Progress: Volume Smoothing , 0 / 100
Sat Oct 07 13:50:32, Info:  Progress: Volume Smoothing , 5 / 100
Sat Oct 07 13:50:35, Info:  Progress: Volume Smoothing , 10 / 100
Sat Oct 07 13:50:37, Info:  Progress: Volume Smoothing , 15 / 100
Sat Oct 07 13:50:39, Info:  Progress: Volume Smoothing , 20 / 100
Sat Oct 07 13:50:42, Info:  Progress: Volume Smoothing , 25 / 100
Sat Oct 07 13:50:44, Info:  Progress: Volume Smoothing , 30 / 100
Sat Oct 07 13:50:47, Info:  Progress: Volume Smoothing , 35 / 100
Sat Oct 07 13:50:49, Info:  Progress: Volume Smoothing , 40 / 100
Sat Oct 07 13:50:51, Info:  Progress: Volume Smoothing , 45 / 100
Sat Oct 07 13:50:53, Info:  Progress: Volume Smoothing , 50 / 100
Sat Oct 07 13:50:55, Info:  Progress: Volume Smoothing , 55 / 100
Sat Oct 07 13:50:57, Info:  Progress: Volume Smoothing , 60 / 100
Sat Oct 07 13:50:59, Info:  Progress: Volume Smoothing , 65 / 100
Sat Oct 07 13:51:01, Info:  Progress: Volume Smoothing , 70 / 100
Sat Oct 07 13:51:03, Info:  Progress: Volume Smoothing , 75 / 100
Sat Oct 07 13:51:05, Info:  Progress: Volume Smoothing , 80 / 100
Sat Oct 07 13:51:06, Info:  Progress: Volume Smoothing , 85 / 100
Sat Oct 07 13:51:08, Info:  Progress: Volume Smoothing , 90 / 100
Sat Oct 07 13:51:11, Info:  Progress: Volume Smoothing , 95 / 100
Sat Oct 07 13:52:09, Info:  Progress: Volume Smoothing , done
Sat Oct 07 13:52:13, Info:  AR statistics:
Sat Oct 07 13:52:22, Info:  AR max: 19.1125
Sat Oct 07 13:52:22, Info:  AR (target: < ~10):
Sat Oct 07 13:52:22, Info:    [ 0.00 , 2.00 ): 593524
Sat Oct 07 13:52:22, Info:    [ 2.00 , 4.00 ): 728855697
Sat Oct 07 13:52:22, Info:    [ 4.00 , 6.00 ): 22308449
Sat Oct 07 13:52:22, Info:    [ 6.00 , 10.00 ): 2083036
Sat Oct 07 13:52:22, Info:    [ 10.00 , 20.00 ): 30302
Sat Oct 07 13:52:22, Info:    [ 20.00 , 40.00 ): 0
Sat Oct 07 13:52:22, Info:    [ 40.00 , 100.00 ): 0
Sat Oct 07 13:52:22, Info:    [ 100.00 ,inf): 0
Sat Oct 07 13:52:24, Info:  Converting mesh to APF
--------------------------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun noticed that process rank 33 with PID 0 on node exception exited on signal 9 (Killed).
--------------------------------------------------------------------------
davschneller commented 1 year ago

You could try the branch https://github.com/SeisSol/PUMGen/tree/davschneller/bypass-apf . I've tried to get around using PUMI/apf (which in total seems to be a bit of a relic from the Netcdf mesh era?); however, the code may not completely be optimized for memory consumption yet. But maybe it works now already; on small meshes it does.

Thomas-Ulrich commented 1 year ago

Thank you. I cannot compile with gcc/12.2

     72    /import/exception-dump/ulrich/spack/lib/spack/env/gcc/g++ -DH5_BUILT_AS_DYNAMIC_LIB -DLOG_LEVEL=2 -DPARALLEL -DUSE_HDF -D_FILE_OFFSET_BITS=64 -D_GNU_SOURCE -D_LARGEFILE64_SOURCE -D_LARGEFILE_SOURCE
            -D_POSIX_C_SOURCE=200809L -I/tmp/ulrich/spack-stage/spack-stage-pumgen-bypass-apf-66in2kth545lc272udok4sxexxp4laib/spack-src/src -I/tmp/ulrich/spack-stage/spack-stage-pumgen-bypass-apf-66in2kth545
           lc272udok4sxexxp4laib/spack-src/submodules -I/import/exception-dump/ulrich/myLibs/spack-packages/linux-debian11-zen2/gcc-12.2.0/pumi-2.2.8-mrdynb2pr2irnayun3j7g53itoxsdrjj/include -isystem /import/
           exception-dump/ulrich/myLibs/spack-packages/linux-debian11-zen2/gcc-12.2.0/hdf5-1.12.2-xwxzdqakdlyy5sknts5wilegc5dlrboi/include -isystem /import/exception-dump/ulrich/myLibs/spack-packages/linux-de
           bian11-zen2/gcc-12.2.0/openmpi-4.1.5-ab5rikqfa63jmfrqq54jugtlhyw3iu7d/include -O2 -std=c++17 -fopenmp -pthread -MD -MT CMakeFiles/pumgen.dir/src/aux/MPIConvenience.cpp.o -MF CMakeFiles/pumgen.dir/s
           rc/aux/MPIConvenience.cpp.o.d -o CMakeFiles/pumgen.dir/src/aux/MPIConvenience.cpp.o -c /tmp/ulrich/spack-stage/spack-stage-pumgen-bypass-apf-66in2kth545lc272udok4sxexxp4laib/spack-src/src/aux/MPICo
           nvenience.cpp
     73    In file included from /tmp/ulrich/spack-stage/spack-stage-pumgen-bypass-apf-66in2kth545lc272udok4sxexxp4laib/spack-src/src/aux/MPIConvenience.cpp:1:
  >> 74    /tmp/ulrich/spack-stage/spack-stage-pumgen-bypass-apf-66in2kth545lc272udok4sxexxp4laib/spack-src/src/aux/MPIConvenience.h:6:75: error: 'size_t' in namespace 'std' does not name a type
     75        6 | void sparseAlltoallv(const void* sendbuf, const int* sendsize, const std::size_t* senddisp,
     76          |                                                                           ^~~~~~
  >> 77    /tmp/ulrich/spack-stage/spack-stage-pumgen-bypass-apf-66in2kth545lc272udok4sxexxp4laib/spack-src/src/aux/MPIConvenience.h:8:33: error: 'size_t' in namespace 'std' does not name a type
     78        8 |                      const std::size_t* recvdisp, MPI_Datatype recvtype, MPI_Comm comm);
     79          |                                 ^~~~~~
  >> 80    make[2]: *** [CMakeFiles/pumgen.dir/build.make:205: CMakeFiles/pumgen.dir/src/aux/MPIConvenience.cpp.o] Error 1
     81    make[2]: *** Waiting for unfinished jobs....
     82    make[2]: Leaving directory '/tmp/ulrich/spack-stage/spack-stage-pumgen-bypass-apf-66in2kth545lc272udok4sxexxp4laib/spack-build-66in2kt'
  >> 83    make[1]: *** [CMakeFiles/Makefile2:88: CMakeFiles/pumgen.dir/all] Error 2
     84    make[1]: Leaving directory '/tmp/ulrich/spack-stage/spack-stage-pumgen-bypass-apf-66in2kth545lc272udok4sxexxp4laib/spack-build-66in2kt'
  >> 85    make: *** [Makefile:139: all] Error 2
davschneller commented 1 year ago

Hi, thanks; I've updated the branch so that it includes cstddef in some more positions; the error should (hopefully) not occur anymore.

Thomas-Ulrich commented 1 year ago

The previous setup worked on 30 nodes. During the volume mesh generation the ram was around 680Gb, and during conversion, it peaked around 715 (just loosly monitoring). In short, issue fixed! Thanks again!

Tue Oct 31 11:52:39, Info:  AR statistics:
Tue Oct 31 11:52:50, Info:  AR max: 19.1125
Tue Oct 31 11:52:50, Info:  AR (target: < ~10):
Tue Oct 31 11:52:50, Info:    [ 0.00 , 2.00 ): 593689
Tue Oct 31 11:52:50, Info:    [ 2.00 , 4.00 ): 728857614
Tue Oct 31 11:52:50, Info:    [ 4.00 , 6.00 ): 22297227
Tue Oct 31 11:52:50, Info:    [ 6.00 , 10.00 ): 2089021
Tue Oct 31 11:52:50, Info:    [ 10.00 , 20.00 ): 30328
Tue Oct 31 11:52:50, Info:    [ 20.00 , 40.00 ): 0
Tue Oct 31 11:52:50, Info:    [ 40.00 , 100.00 ): 0
Tue Oct 31 11:52:50, Info:    [ 100.00 ,inf): 0
Tue Oct 31 11:52:50, Info:  Iterating over mesh to get data...
Tue Oct 31 11:52:50, Info:  Counting part 0 / 1
Tue Oct 31 11:52:50, Info:  Local cells: 6352716
Tue Oct 31 11:52:50, Info:  Local vertices: 1112747
Tue Oct 31 11:52:50, Info:  Local vertices (with duplicates): 1288241
Tue Oct 31 11:52:50, Info:  Processing part 0 / 1
Tue Oct 31 11:52:50, Info:  Vertices: 0 to 1112747
Tue Oct 31 11:52:55, Info:  Local vertices (really, now): 258298
Tue Oct 31 11:52:55, Info:  Processing part 0 / 1
Tue Oct 31 11:52:55, Info:  Connectivity: 0 to 6352716
Tue Oct 31 11:52:56, Info:  Groups
Tue Oct 31 11:52:57, Info:  Boundaries
Tue Oct 31 11:52:57, Info:  Parsed mesh successfully, writing output...
Tue Oct 31 11:53:05, Info:  Total cell count: 753867879
Tue Oct 31 11:53:05, Info:  Total vertex count: 128515465
Tue Oct 31 11:53:29, Info:  Minimum insphere found: 4.75306
Tue Oct 31 11:53:29, Info:  Writing cells
Tue Oct 31 11:54:07, Info:  Writing vertices
Tue Oct 31 11:54:11, Info:  Writing group information
Tue Oct 31 11:54:19, Info:  Writing boundary condition
Tue Oct 31 11:54:22, Info:  Writing XDMF file
Tue Oct 31 11:54:26, Info:  Finished successfully
davschneller commented 1 year ago

That is great news! Then I'll try to make the PR ready some time soon-ish; it should make APF completely optional for Simmodeler and any serial mesh file (GMSH etc. should also work without APF; though I'll have to test that).

The only thing that we'd really be missing are the options described around here: https://github.com/SeisSol/PUMGen/blob/3fb1875df66c7ce14790af7d6dd3521a6ef71fcb/src/pumgen.cpp#L172-L192