SeisSol / PUMGen

Mesh generation for SeisSol
BSD 3-Clause "New" or "Revised" License
7 stars 3 forks source link

Segmentation Fault when Converting a larger .neu File with PumGen #77

Closed draguve closed 3 months ago

draguve commented 3 months ago

I've encountered another issue while attempting to convert a .neu file generated with SimModeler v2024.0-240519 using PumGen. The mesh is larger this time (about 28 GB), and it causes a segmentation fault in both the latest version and v1.0.1 of PumGen. I can provide the mesh if that would be helpful.

I compiled pumgen without simmetrix support using the following library versions:

I can run this with gdb or a debug build if you think it would be useful.


The program crashes during the conversion process with the following error output:
Wed Jul 24 12:18:19, Info:  No filtering enabled (contiguous storage) 
Wed Jul 24 12:18:19, Info:  Using 32-bit integer boundary type conditions, or 8 bit per face (i32). 
Wed Jul 24 12:18:19, Info:  Using Gambit mesh 
Wed Jul 24 12:18:19, Warn:  Gambit format does not seem to have a fixed boundary line length. Trying with variable line length 
Wed Jul 24 12:18:19, Warn:  Gambit format does not seem to have a fixed boundary line length. Trying with variable line length 
Wed Jul 24 12:18:19, Warn:  Gambit format does not seem to have a fixed boundary line length. Trying with variable line length 
Wed Jul 24 12:18:24, Info:  Read vertex coordinates 
Wed Jul 24 12:18:24, Info:  Reading vertices part 1 of 1 
Wed Jul 24 12:20:08, Info:  Read cell vertices 
Wed Jul 24 12:20:11, Info:  Reading elements part 1 of 1 
Wed Jul 24 12:26:10, Info:  Read cell groups 
Wed Jul 24 12:26:11, Info:  Reading group information part 1 of 1 
[cos-bmadden-dt:373731] *** Process received signal ***
[cos-bmadden-dt:373731] Signal: Segmentation fault (11)
[cos-bmadden-dt:373731] Signal code: Address not mapped (1)
[cos-bmadden-dt:373731] Failing at address: 0x16ed07f840
[cos-bmadden-dt:373731] [ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x42520)[0x7f4f9c9df520]
[cos-bmadden-dt:373731] [ 1] /home/017552119/PUMGen/pumgen/build/pumgen(+0x16455)[0x55cd30bc8455]
[cos-bmadden-dt:373731] [ 2] /home/017552119/PUMGen/pumgen/build/pumgen(+0x1779b)[0x55cd30bc979b]
[cos-bmadden-dt:373731] [ 3] /home/017552119/PUMGen/pumgen/build/pumgen(+0x1906d)[0x55cd30bcb06d]
[cos-bmadden-dt:373731] [ 4] /home/017552119/PUMGen/pumgen/build/pumgen(+0x976f)[0x55cd30bbb76f]
[cos-bmadden-dt:373731] [ 5] /lib/x86_64-linux-gnu/libc.so.6(+0x29d90)[0x7f4f9c9c6d90]
[cos-bmadden-dt:373731] [ 6] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x80)[0x7f4f9c9c6e40]
[cos-bmadden-dt:373731] [ 7] /home/017552119/PUMGen/pumgen/build/pumgen(+0xc025)[0x55cd30bbe025]
[cos-bmadden-dt:373731] *** End of error message ***
Segmentation fault (core dumped)
davschneller commented 3 months ago

Hi, having a run with gdb (if feasible) on the latest PUMgen version would be great. (if it's feasible with your given memory config) One backtrack should probably give a good hint where the problem may lie.

(v1.0.1 may just fail due to the dependency on PUMI—it may consume too much memory due to that)

draguve commented 3 months ago

Hi, this is the backtrace from gdb

Starting program: /home/017552119/PUMGen/pumgen/build/pumgen highden.neu
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[Detaching after fork from child process 213790]
[New Thread 0x7ffff6a95640 (LWP 213793)]
[New Thread 0x7ffff6294640 (LWP 213794)]
Wed Jul 31 12:55:58, Info:  No filtering enabled (contiguous storage)
Wed Jul 31 12:55:58, Info:  Using 32-bit integer boundary type conditions, or 8 bit per face (i32).
Wed Jul 31 12:55:58, Info:  Using Gambit mesh
Wed Jul 31 12:55:58, Warn:  Gambit format does not seem to have a fixed boundary line length. Trying with variable line length
Wed Jul 31 12:55:59, Warn:  Gambit format does not seem to have a fixed boundary line length. Trying with variable line length
Wed Jul 31 12:55:59, Warn:  Gambit format does not seem to have a fixed boundary line length. Trying with variable line length
Wed Jul 31 12:56:03, Info:  Read vertex coordinates
Wed Jul 31 12:56:04, Info:  Reading vertices part 1 of 1
Wed Jul 31 12:57:52, Info:  Read cell vertices
Wed Jul 31 12:57:55, Info:  Reading elements part 1 of 1
Wed Jul 31 13:04:26, Info:  Read cell groups
Wed Jul 31 13:04:27, Info:  Reading group information part 1 of 1

Thread 1 "pumgen" received signal SIGSEGV, Segmentation fault.
0x000055555556a455 in int& std::vector<int, std::allocator<int> >::emplace_back<int>(int&&) ()
(gdb) backtrace
#0  0x000055555556a455 in int& std::vector<int, std::allocator<int> >::emplace_back<int>(int&&) ()
#1  0x000055555556b79b in puml::ParallelGambitReader::readGroups(int*) ()
#2  0x000055555556d06d in SerialMeshFile<puml::ParallelGambitReader>::open(char const*) ()
#3  0x000055555555d76f in main ()
davschneller commented 3 months ago

Thanks for the backtrack! I got an idea where the problem could lie... At least partially. Somehow it reads an element ID that's larger than the total element count (which causes it to want to transfer it to a non-existing higher rank—thus causing an element push_back/emplace_back in a non-existent vector). I.e. somehow we read a higher number than there are elements.

Sorry if I'm asking a lot; but is there a possibility to get to the neutral file somehow?

draguve commented 3 months ago

Sure give me a few hours minutes, I'll upload it and give you a link.

draguve commented 3 months ago

Here you go, it should decompress to 15233MBs

davschneller commented 3 months ago

Thank you for it! I could reproduce the bug locally.

davschneller commented 3 months ago

Debugged a bit—but it may be a problem of the file alas (or the program that wrote it—looks like FORTRAN to me?): seemingly, it only supports 7 or 8-digit numbers for output; but we need 9 digits here.

In fact, the first error occurs here:

$ grep "\*\*\*\*\*\*\*\*\*" highden.neu -A 10 -B 10 -m 1
99999901999999029999990399999904999999059999990699999907999999089999990999999910
99999911999999129999991399999914999999159999991699999917999999189999991999999920
99999921999999229999992399999924999999259999992699999927999999289999992999999930
99999931999999329999993399999934999999359999993699999937999999389999993999999940
99999941999999429999994399999944999999459999994699999947999999489999994999999950
99999951999999529999995399999954999999559999995699999957999999589999995999999960
99999961999999629999996399999964999999659999996699999967999999689999996999999970
99999971999999729999997399999974999999759999997699999977999999789999997999999980
99999981999999829999998399999984999999859999998699999987999999889999998999999990
999999919999999299999993999999949999999599999996999999979999999899999999********
********************************************************************************
********************************************************************************
********************************************************************************
********************************************************************************
********************************************************************************
********************************************************************************
********************************************************************************
********************************************************************************
********************************************************************************
********************************************************************************
********************************************************************************

The stars will be read as "zero" which, when seen in a 1-based representation causes the bug (i.e. we set it to 2**64 - 1).

The asterisks are also printed for the element number (even if non-breaking in that case).


To be absolutely certain, in case it's not known: we do have a direct Simmodeler integration to PUMgen, if that could be of use.

draguve commented 3 months ago

Oh even in the elements section "ELEMENTS/CELLS 1.2.1" it seems to be limited by number of digits in the cell index after 7 digits it seems to write over the white space,

17134764  6  4  9989421 9989403 9989407 9989400
and later lines look like this ..
21032968  6  4 12019546120177761201953212019477

I'll create a support ticket with Simmetrix then


I would like to use Simmodeler integration but was having issues with compiling PUMI with SimModeler v2024.0-240519 there seems to have been a breaking change in the API so it refuses to compile both with version 2.2.7 and 2.2.8.

davschneller commented 3 months ago

I see—yes, we also had to adapt some parts for the changes (cf. #73 ).

However, you don't need PUMI for the master version of PUMgen anymore even when using Simmodeler; we had removed the dependency some time ago (cf. #68 ).

draguve commented 3 months ago

Oh, I see. I wasn't aware it was an optional dependency. I'll try compiling PUMGen without PUMI and see if that works. Thank you so much! Should I close this issue?

davschneller commented 3 months ago

Glad to help! Yeah, it can be closed. Let me do that real quick; the only thing that we could add are more proper error messages in these cases (i.e. some field is covered by asterisks). Or in case Simmodeler supports some other file format that we could parse.