HSU-HPC / MaMiCo

The macro-micro-coupling tool for coupled molecular-continuum flow simulation
BSD 3-Clause "New" or "Revised" License
13 stars 5 forks source link

Config validation #18

Closed rubenhorn closed 1 month ago

rubenhorn commented 1 year ago

Running couette.xml.example with 3 MPI ranks causes crash (works with 2 or 4):

$ mpirun -n 3 ../build/couette 
Run CouetteScenario...
Run CouetteScenario...
Run CouetteScenario...
Initialization: 0ms
Finish CouetteScenario::initSolvers() 
Finish CouetteScenario::initSolvers() 
Finish CouetteScenario::initSolvers() 
double free or corruption (out)
[thinkpad:1050255] *** Process received signal ***
[thinkpad:1050255] Signal: Aborted (6)
[thinkpad:1050255] Signal code:  (-6)
[thinkpad:1050255] [ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x42520)[0x7fda8f042520]
[thinkpad:1050255] [ 1] /lib/x86_64-linux-gnu/libc.so.6(pthread_kill+0x12c)[0x7fda8f0969fc]
[thinkpad:1050255] [ 2] /lib/x86_64-linux-gnu/libc.so.6(raise+0x16)[0x7fda8f042476]
[thinkpad:1050255] [ 3] /lib/x86_64-linux-gnu/libc.so.6(abort+0xd3)[0x7fda8f0287f3]
[thinkpad:1050255] [ 4] /lib/x86_64-linux-gnu/libc.so.6(+0x89676)[0x7fda8f089676]
[thinkpad:1050255] [ 5] /lib/x86_64-linux-gnu/libc.so.6(+0xa0cfc)[0x7fda8f0a0cfc]
[thinkpad:1050255] [ 6] /lib/x86_64-linux-gnu/libc.so.6(+0xa2e70)[0x7fda8f0a2e70]
[thinkpad:1050255] [ 7] /lib/x86_64-linux-gnu/libc.so.6(free+0x73)[0x7fda8f0a5453]
[thinkpad:1050255] [ 8] ../build/couette(_ZN8coupling8services31MacroscopicCellServiceMacroOnlyILj3EED0Ev+0x3e)[0x558b56981bee]
[thinkpad:1050255] [ 9] ../build/couette(_ZN15CouetteScenario8shutdownEv+0x431)[0x558b569c3ca1]
[thinkpad:1050255] [10] ../build/couette(_Z11runScenarioP8Scenario+0x9d)[0x558b5697423d]
[thinkpad:1050255] [11] ../build/couette(main+0x3d)[0x558b569622cd]
[thinkpad:1050255] [12] /lib/x86_64-linux-gnu/libc.so.6(+0x29d90)[0x7fda8f029d90]
[thinkpad:1050255] [13] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x80)[0x7fda8f029e40]
[thinkpad:1050255] [14] ../build/couette(_start+0x25)[0x558b56963535]
[thinkpad:1050255] *** End of error message ***
--------------------------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun noticed that process rank 2 with PID 0 on node thinkpad exited on signal 6 (Aborted).
--------------------------------------------------------------------------
$ git rev-parse --short HEAD
33f31b10
LouieVoit commented 12 months ago

Could you share your config file ? I just tried and it works for me by setting number-of-processes=3; 1; 1

rubenhorn commented 12 months ago

Yes, I am using the unmodified couette.xml.template with number-of-processes="1 ; 1 ; 1". I think the config validation code should catch this.

Thinkpiet commented 1 month ago

This happens because the couette.xml.template uses number-md-simulations="4".

I suggest to just remove the template XML configs. They are deprecated. The config generator should be used instead - this approach offers much more config validation.

rubenhorn commented 1 month ago

This happens because the couette.xml.template uses number-md-simulations="4".

I suggest to just remove the template XML configs. They are deprecated. The config generator should be used instead - this approach offers much more config validation.

Yes! See #68