danieljprice / phantom

Phantom Smoothed Particle Hydrodynamics and Magnetohydrodynamics code
https://phantomsph.github.io
Other
103 stars 230 forks source link

Crash within header allocation in the middle of a run ( SETUP=cluster) #547

Closed Yrisch closed 4 months ago

Yrisch commented 4 months ago

I found a bug with the cluster setup after the new integrator implementations. My initial cloud has a radius of 10 pc and a mass of 4.7e4 Msun. sink accretion radii are set to 4000 au. ( See attached .in and .setup files for more details ) It works with no issues during 231 dumps. It created 25 sinks smoothly. However, it crashed at the next dump with a SIGARBRT :

phantom: malloc.c:4302: _int_malloc: Assertion `(unsigned long) (size) >= (unsigned long) (nb)' failed.Program received signal SIGABRT: Process abort signal.Backtrace for this error:

0 0x78d89d023970 in ???

1 0x78d89d022ad5 in ???

2 0x78d89cc4251f in ???

    at ./signal/../sysdeps/unix/sysv/linux/x86_64/libc_sigaction.c:0

3 0x78d89cc969fc in __pthread_kill_implementation

    at ./nptl/pthread_kill.c:44

4 0x78d89cc969fc in __pthread_kill_internal

    at ./nptl/pthread_kill.c:78

5 0x78d89cc969fc in _GI__pthread_kill

    at ./nptl/pthread_kill.c:89

6 0x78d89cc42475 in __GI_raise

    at ../sysdeps/posix/raise.c:26

7 0x78d89cc287f2 in __GI_abort

    at ./stdlib/abort.c:79

8 0x78d89cca0ec9 in __malloc_assert

    at ./malloc/malloc.c:307

9 0x78d89cca4984 in _int_malloc

    at ./malloc/malloc.c:4302

10 0x78d89cca5138 in _GI__libc_malloc

    at ./malloc/malloc.c:3329

11 0x5f6f1a739d3c in __dump_utils_MOD_allocate_header

    at ../src/main/utils_dumpfiles.f90:1399

12 0x5f6f1a87497c in __readwrite_dumps_fortran_MOD_write_smalldump_fortran

    at ../src/main/readwrite_dumps_fortran.f90:403

13 0x5f6f1a90e3e8 in __evolve_MOD_evol

    at ../src/main/evolve.F90:536

14 0x5f6f1a91f39f in phantom

    at ../src/main/phantom.f90:66

15 0x5f6f1a6f76c8 in main

    at ../src/main/phantom.f90:26

Abandon (core dumped)

It points something about header allocation while writing dump.... It doesn't happen before subgroups and 4th order scheme pull requests... I checked every changes between the two versions but I did not see anything obvious. I tried on two different machines and it crashes the same way (with gfortran 13.2). The crash also happens while restarting from dump 230. I did not see any memory leak. I managed to avoid the crash by printing maxphead before calling allocate_header in write_smalldump_fortran routine. It seems to be a optimisation bug from the compiler...

I attached the last full dump to reproduce the crash quickly.

attached files : https://drive.google.com/file/d/1PCILpt0xnt-cW7RNfoOPlgi8QnCNF1V2/view?usp=sharing

Yrisch commented 4 months ago

I found something while compiling with the flagfsanitize=address,undefined. Something goes wrong in the ptmass_create routine. I found a heap buffer overflow during the fxyz_ptmass_sinksink initialisation ptmass.f90:1539 in this routine. If this line is commented, the code run smoothly...

Yrisch commented 4 months ago

It should be fixed with #548. I cleaned up a bit the routine and use another index to initialize the array. I'm not sure to understand completely why it works with this fix. However reusing nptmass after changing its value in the middle of the routine sounds a bit risky to me. It could be dangerous if optimizations occur in this routine and change the order of the lines...