SoftSimu / CellSim3D

GPU Accelerated 3D Cell Simulator
GNU General Public License v2.0
7 stars 7 forks source link

Double free error #2

Open pmadhikar opened 5 years ago

pmadhikar commented 5 years ago

@Avenger-Bagnarol

The second is that at the vary end of the run, after all the trajectories written, the program in my computer throws an "Error in `./CellDiv': double free or corruption" exception. I cannot figure out if free() is called twice for something or what else.

pmadhikar commented 5 years ago

@Avenger-Bagnarol Can you please provide the full error message?

Avenger-Bagnarol commented 5 years ago

@pmadhikar Sure. Let just me anticipate that: 1) It happens at every run, at the very end, after the quote Xdiv = 11, Ydiv = 10, Zdiv = 1. 2) It seems not to trouble the output writing, as I get my .xyz trajectories, the forces and .dat counting file. I have still to analyze the output, as I am performing trial runs, but I have severe doubts this issue is affecting it. 3) I am very puzzled by this because I modified the main in GPUbounce.cu by just adding a printf("here\n"); between the lines 1571 and 1600, even just before the return 0;, and it always printed here before the error message. It might be a problem of my computer: in that case, I apologize in advance for your time.

The termanl prints: *** Error in./CellDiv': double free or corruption (!prev): 0x0000000001423d60 *** ======= Backtrace: ========= /lib/x86_64-linux-gnu/libc.so.6(+0x777e5)[0x7fdd424ef7e5] /lib/x86_64-linux-gnu/libc.so.6(+0x8037a)[0x7fdd424f837a] /lib/x86_64-linux-gnu/libc.so.6(cfree+0x4c)[0x7fdd424fc53c] ./CellDiv[0x41e1cc] ./CellDiv[0x41cfe3] ./CellDiv[0x41d011] ./CellDiv[0x41ba14] ./CellDiv[0x41a4a5] ./CellDiv[0x419209] ./CellDiv[0x418238] ./CellDiv[0x40f6eb] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0)[0x7fdd42498830] ./CellDiv[0x405fd9] ======= Memory map: ======== 00400000-00738000 r-xp 00000000 08:02 10102933 /home/xrdlabuser/Mirko_Bagnarol/CellSim3D-master/bin/CellDiv 00938000-0093b000 r--p 00338000 08:02 10102933 /home/xrdlabuser/Mirko_Bagnarol/CellSim3D-master/bin/CellDiv 0093b000-0094f000 rw-p 0033b000 08:02 10102933 /home/xrdlabuser/Mirko_Bagnarol/CellSim3D-master/bin/CellDiv 0094f000-009b2000 rw-p 00000000 00:00 0 00ef2000-059d1000 rw-p 00000000 00:00 0 [heap] 200000000-200100000 rw-s 00000000 00:06 543 /dev/nvidiactl 200100000-200300000 rw-s 00000000 00:06 543 /dev/nvidiactl 200300000-202b00000 rw-s 00000000 00:06 543 /dev/nvidiactl 202b00000-202c00000 rw-s 00000000 00:06 543 /dev/nvidiactl 202c00000-202d00000 rw-s 00000000 00:05 515361 /dev/zero (deleted) 202d00000-202e00000 rw-s 00000000 00:06 543 /dev/nvidiactl 202e00000-202f00000 rw-s 00000000 00:05 515362 /dev/zero (deleted) 202f00000-203000000 rw-s 00000000 00:06 543 /dev/nvidiactl 203000000-203120000 ---p 00000000 00:00 0 203120000-203220000 rw-s 00000000 00:06 543 /dev/nvidiactl 203220000-203300000 rw-s 00000000 00:06 543 /dev/nvidiactl 203300000-203400000 rw-s 00000000 00:05 515365 /dev/zero (deleted) 203400000-203501000 rw-s 00000000 00:05 515366 /dev/zero (deleted) 203501000-e00000000 ---p 00000000 00:00 0 7fdd20000000-7fdd20001000 rw-s 00000000 00:06 437 /dev/nvidia0 7fdd20001000-7fdd20002000 rw-s 00000000 00:06 437 /dev/nvidia0 7fdd20002000-7fdd20003000 rw-s 00000000 00:06 437 /dev/nvidia0 7fdd20003000-7fdd20004000 rw-s 00000000 00:06 437 /dev/nvidia0 7fdd20004000-7fdd20005000 rw-s 00000000 00:06 437 /dev/nvidia0 7fdd20005000-7fdd20006000 rw-s 00000000 00:06 437 /dev/nvidia0 7fdd20006000-7fdd20007000 rw-s 00000000 00:06 437 /dev/nvidia0 7fdd20007000-7fdd20008000 rw-s 00000000 00:06 437 /dev/nvidia0 7fdd20008000-7fdd20009000 rw-s 00000000 00:06 437 /dev/nvidia0 7fdd20009000-7fdd2000a000 rw-s 00000000 00:06 437 /dev/nvidia0 7fdd2000a000-7fdd2000b000 rw-s 00000000 00:06 437 /dev/nvidia0 7fdd2000b000-7fdd2000c000 rw-s 00000000 00:06 437 /dev/nvidia0 7fdd2000c000-7fdd2000d000 rw-s 00000000 00:06 437 /dev/nvidia0 7fdd2000d000-7fdd2000e000 rw-s 00000000 00:06 437 /dev/nvidia0 7fdd2000e000-7fdd2000f000 rw-s 00000000 00:06 437 /dev/nvidia0 7fdd2000f000-7fdd20010000 rw-s 00000000 00:06 437 /dev/nvidia0 7fdd20010000-7fdd30000000 ---p 00000000 00:00 0 7fdd30000000-7fdd30021000 rw-p 00000000 00:00 0 7fdd30021000-7fdd34000000 ---p 00000000 00:00 0 7fdd34000000-7fdd34021000 rw-p 00000000 00:00 0 7fdd34021000-7fdd38000000 ---p 00000000 00:00 0 7fdd38000000-7fdd38021000 rw-p 00000000 00:00 0 7fdd38021000-7fdd3c000000 ---p 00000000 00:00 0 7fdd3ce7d000-7fdd3d874000 rw-p 00000000 00:00 0 7fdd3e26b000-7fdd3e2d5000 rw-p 00000000 00:00 0 7fdd3e2d5000-7fdd3e2d6000 ---p 00000000 00:00 0 7fdd3e2d6000-7fdd3ead6000 rw-p 00000000 00:00 0 7fdd3ead6000-7fdd3ead7000 ---p 00000000 00:00 0 7fdd3ead7000-7fdd3f2d7000 rw-p 00000000 00:00 0 7fdd3f2d7000-7fdd3f2d8000 ---p 00000000 00:00 0 7fdd3f2d8000-7fdd3fad8000 rw-p 00000000 00:00 0 7fdd3fad8000-7fdd3fb1f000 r-xp 00000000 08:02 5913915 /usr/lib/nvidia-418/libnvidia-fatbinaryloader.so.418.39 7fdd3fb1f000-7fdd3fd1f000 ---p 00047000 08:02 5913915 /usr/lib/nvidia-418/libnvidia-fatbinaryloader.so.418.39 7fdd3fd1f000-7fdd3fd21000 rw-p 00047000 08:02 5913915 /usr/lib/nvidia-418/libnvidia-fatbinaryloader.so.418.39 7fdd3fd21000-7fdd3fd26000 rw-p 00000000 00:00 0 7fdd3fd26000-7fdd40af9000 r-xp 00000000 08:02 5636112 /usr/lib/i386-linux-gnu/libcuda.so.418.39 7fdd40af9000-7fdd40cf8000 ---p 00dd3000 08:02 5636112 /usr/lib/i386-linux-gnu/libcuda.so.418.39 7fdd40cf8000-7fdd40e6c000 rw-p 00dd2000 08:02 5636112 /usr/lib/i386-linux-gnu/libcuda.so.418.39 7fdd40e6c000-7fdd40e7c000 rw-p 00000000 00:00 0 7fdd42478000-7fdd42638000 r-xp 00000000 08:02 10753086 /lib/x86_64-linux-gnu/libc-2.23.so 7fdd42638000-7fdd42838000 ---p 001c0000 08:02 10753086 /lib/x86_64-linux-gnu/libc-2.23.so 7fdd42838000-7fdd4283c000 r--p 001c0000 08:02 10753086 /lib/x86_64-linux-gnu/libc-2.23.so 7fdd4283c000-7fdd4283e000 rw-p 001c4000 08:02 10753086 /lib/x86_64-linux-gnu/libc-2.23.so 7fdd4283e000-7fdd42842000 rw-p 00000000 00:00 0 7fdd42842000-7fdd42859000 r-xp 00000000 08:02 10752396 /lib/x86_64-linux-gnu/libgcc_s.so.1 7fdd42859000-7fdd42a58000 ---p 00017000 08:02 10752396 /lib/x86_64-linux-gnu/libgcc_s.so.1 7fdd42a58000-7fdd42a59000 r--p 00016000 08:02 10752396 /lib/x86_64-linux-gnu/libgcc_s.so.1 7fdd42a59000-7fdd42a5a000 rw-p 00017000 08:02 10752396 /lib/x86_64-linux-gnu/libgcc_s.so.1 7fdd42a5a000-7fdd42b62000 r-xp 00000000 08:02 10753119 /lib/x86_64-linux-gnu/libm-2.23.so 7fdd42b62000-7fdd42d61000 ---p 00108000 08:02 10753119 /lib/x86_64-linux-gnu/libm-2.23.so 7fdd42d61000-7fdd42d62000 r--p 00107000 08:02 10753119 /lib/x86_64-linux-gnu/libm-2.23.so 7fdd42d62000-7fdd42d63000 rw-p 00108000 08:02 10753119 /lib/x86_64-linux-gnu/libm-2.23.so 7fdd42d63000-7fdd42edf000 r-xp 00000000 08:02 5637328 /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.25 7fdd42edf000-7fdd430df000 ---p 0017c000 08:02 5637328 /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.25 7fdd430df000-7fdd430e9000 r--p 0017c000 08:02 5637328 /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.25 7fdd430e9000-7fdd430eb000 rw-p 00186000 08:02 5637328 /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.25 7fdd430eb000-7fdd430ef000 rw-p 00000000 00:00 0 7fdd430ef000-7fdd430f2000 r-xp 00000000 08:02 10752514 /lib/x86_64-linux-gnu/libdl-2.23.so 7fdd430f2000-7fdd432f1000 ---p 00003000 08:02 10752514 /lib/x86_64-linux-gnu/libdl-2.23.so 7fdd432f1000-7fdd432f2000 r--p 00002000 08:02 10752514 /lib/x86_64-linux-gnu/libdl-2.23.so 7fdd432f2000-7fdd432f3000 rw-p 00003000 08:02 10752514 /lib/x86_64-linux-gnu/libdl-2.23.so 7fdd432f3000-7fdd4330b000 r-xp 00000000 08:02 10752618 /lib/x86_64-linux-gnu/libpthread-2.23.so 7fdd4330b000-7fdd4350a000 ---p 00018000 08:02 10752618 /lib/x86_64-linux-gnu/libpthread-2.23.so 7fdd4350a000-7fdd4350b000 r--p 00017000 08:02 10752618 /lib/x86_64-linux-gnu/libpthread-2.23.so 7fdd4350b000-7fdd4350c000 rw-p 00018000 08:02 10752618 /lib/x86_64-linux-gnu/libpthread-2.23.so 7fdd4350c000-7fdd43510000 rw-p 00000000 00:00 0 7fdd43510000-7fdd43517000 r-xp 00000000 08:02 10747990 /lib/x86_64-linux-gnu/librt-2.23.so 7fdd43517000-7fdd43716000 ---p 00007000 08:02 10747990 /lib/x86_64-linux-gnu/librt-2.23.so 7fdd43716000-7fdd43717000 r--p 00006000 08:02 10747990 /lib/x86_64-linux-gnu/librt-2.23.so 7fdd43717000-7fdd43718000 rw-p 00007000 08:02 10747990 /lib/x86_64-linux-gnu/librt-2.23.so 7fdd43718000-7fdd454a5000 r-xp 00000000 08:02 5638893 /usr/lib/x86_64-linux-gnu/libcurand.so.7.5.18 7fdd454a5000-7fdd456a5000 ---p 01d8d000 08:02 5638893 /usr/lib/x86_64-linux-gnu/libcurand.so.7.5.18 7fdd456a5000-7fdd46a76000 rw-p 01d8d000 08:02 5638893 /usr/lib/x86_64-linux-gnu/libcurand.so.7.5.18 7fdd46a76000-7fdd46f80000 rw-p 00000000 00:00 0 7fdd46f80000-7fdd46fa6000 r-xp 00000000 08:02 10752616 /lib/x86_64-linux-gnu/ld-2.23.so 7fdd46fce000-7fdd46ff7000 rw-p 00000000 00:00 0 7fdd46ff7000-7fdd470f7000 rw-s 00000000 00:05 515363 /dev/zero (deleted) 7fdd470f7000-7fdd470f8000 rw-s 00000000 00:06 543 /dev/nvidiactl 7fdd470f8000-7fdd470f9000 rw-s 00000000 00:06 543 /dev/nvidiactl 7fdd470f9000-7fdd470fa000 rw-s 00000000 00:06 543 /dev/nvidiactl 7fdd470fa000-7fdd470fb000 rw-s 00000000 00:06 543 /dev/nvidiactl 7fdd470fb000-7fdd470fc000 rw-s 00000000 00:06 543 /dev/nvidiactl 7fdd470fc000-7fdd470fd000 rw-s 00000000 00:06 543 /dev/nvidiactl 7fdd470fd000-7fdd470fe000 rw-s 00000000 00:06 543 /dev/nvidiactl 7fdd470fe000-7fdd470ff000 rw-s 00000000 00:06 543 /dev/nvidiactl 7fdd470ff000-7fdd47100000 rw-s 00000000 00:06 543 /dev/nvidiactl 7fdd47100000-7fdd47101000 rw-s 00000000 00:06 543 /dev/nvidiactl 7fdd47101000-7fdd47102000 rw-s 00000000 00:06 543 /dev/nvidiactl 7fdd47102000-7fdd47103000 rw-s 00000000 00:06 543 /dev/nvidiactl 7fdd47103000-7fdd47104000 rw-s 00000000 00:06 543 /dev/nvidiactl 7fdd47104000-7fdd47105000 rw-s 00000000 00:06 543 /dev/nvidiactl 7fdd47105000-7fdd47106000 rw-s 00000000 00:06 543 /dev/nvidiactl 7fdd47136000-7fdd4717a000 rw-p 00000000 00:00 0 7fdd4717a000-7fdd4717b000 rw-s 00000000 00:06 543 /dev/nvidiactl 7fdd4717b000-7fdd471a5000 rw-p 00000000 00:00 0 7fdd471a5000-7fdd471a6000 r--p 00025000 08:02 10752616 /lib/x86_64-linux-gnu/ld-2.23.so 7fdd471a6000-7fdd471a7000 rw-p 00026000 08:02 10752616 /lib/x86_64-linux-gnu/ld-2.23.so 7fdd471a7000-7fdd471a8000 rw-p 00000000 00:00 0 7ffd9dca3000-7ffd9dcc4000 rw-p 00000000 00:00 0 [stack] 7ffd9dd57000-7ffd9dd5a000 r--p 00000000 00:00 0 [vvar] 7ffd9dd5a000-7ffd9dd5c000 r-xp 00000000 00:00 0 [vdso] ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0 [vsyscall] Aborted`

pmadhikar commented 5 years ago

Thanks for the detailed log.

I am struggling to decipher what exactly is happening . We can investigate further together. Do you use any other simulators that use CUDA? Gromacs for instance? If other CUDA programs work, then I would think your hardware is working fine and the error is in CellSim3D.

Try the following:

  1. Compile in debug mode make clean make debug
  2. Do you have a GPU with compute capability higher than 3.5? If not you will need to do everything below on a GPU that is not running a display server and skip this step. If you do, then export the variable below. Your GPU's compute capability is shown in the CellSim3D output log. export CUDA_DEBUGGER_SOFTWARE_PREEMPTION=1
  3. Run the simulator in cuda-gdb as follows cuda-gdb --args bin/CellDiv inp.json 0 In the gdb prompt: type r to run the simulator. It will be much slower than normal. I recommend setting up a short simulation (~1000 steps) with a single cell and setting a large threshold division volume (~100).

Now we should see exactly where the double free occurs. Please paste the output here. Then print the crash backtrace by typing bt in the gdb prompt and post the output of that here too.

Thanks once again for reporting this problem and your help.

Avenger-Bagnarol commented 5 years ago

@pmadhikar I have a GPU with 5,2 compute capability, and since this is not my own computer I have not used any software like GROMACS yet. Anyway, here is the output: After typing r : Program received signal SIGABRT, Aborted. Python Exception <type 'exceptions.AttributeError'> 'module' object has no attribute 'TYPE_CODE_RVALUE_REF': 0x00007ffff3304428 in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:54 54 ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.

After typing bt : Python` Exception <type 'exceptions.AttributeError'> 'module' object has no attribute 'TYPE_CODE_RVALUE_REF':

0 0x00007ffff3304428 in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:54

1 0x00007ffff330602a in __GI_abort () at abort.c:89

Python Exception <type 'exceptions.AttributeError'> 'module' object has no attribute 'TYPE_CODE_RVALUE_REF': Python Exception <type 'exceptions.AttributeError'> 'module' object has no attribute 'TYPE_CODE_RVALUE_REF':

2 0x00007ffff33467ea in __libc_message (do_abort=do_abort@entry=2,

fmt=fmt@entry=0x7ffff345fed8 <__PRETTY_FUNCTION__.12050+352> "*** Error in `%s': %s: 0x%s ***\n")
at ../sysdeps/posix/libc_fatal.c:175

Python Exception <type 'exceptions.AttributeError'> 'module' object has no attribute 'TYPE_CODE_RVALUE_REF':

3 0x00007ffff334f37a in malloc_printerr (ar_ptr=, ptr=,

Python Exception <type 'exceptions.AttributeError'> 'module' object has no attribute 'TYPE_CODE_RVALUE_REF': str=0x7ffff3460008 <__PRETTY_FUNCTION__.12050+656> "double free or corruption (!prev)", action=3) at malloc.c:5006 Python Exception <type 'exceptions.AttributeError'> 'module' object has no attribute 'TYPE_CODE_RVALUE_REF':

4 _int_free (av=, p=, have_lock=0) at malloc.c:3867

5 0x00007ffff335353c in __GI___libc_free (mem=) at malloc.c:2968

Python Exception <type 'exceptions.AttributeError'> 'module' object has no attribute 'TYPE_CODE_RVALUE_REF': Python Exception <type 'exceptions.AttributeError'> 'module' object has no attribute 'TYPE_CODE_RVALUE_REF':

6 0x000000000041df7c in gnu_cxx::new_allocator::deallocate (this=0x7fffffffd860, p=0x2bf67b0)

at /usr/include/c++/5/ext/new_allocator.h:110

Python Exception <type 'exceptions.AttributeError'> 'module' object has no attribute 'TYPE_CODE_RVALUE_REF': Python Exception <type 'exceptions.AttributeError'> 'module' object has no attribute 'TYPE_CODE_RVALUE_REF':

7 0x000000000041cd93 in thrust::detail::allocator_traits<std::allocator >::deallocate(std::allocator&, float, unsigned long)::workaround_warnings::deallocate(std::allocator&, float, unsigned long) (a=...,

Python Exception <type 'exceptions.AttributeError'> 'module' object has no attribute 'TYPE_CODE_RVALUE_REF': p=0x2bf67b0, n=10000) at /usr/include/thrust/detail/allocator/allocator_traits.inl:256 Python Exception <type 'exceptions.AttributeError'> 'module' object has no attribute 'TYPE_CODE_RVALUE_REF': Python Exception <type 'exceptions.AttributeError'> 'module' object has no attribute 'TYPE_CODE_RVALUE_REF': Python Exception <type 'exceptions.AttributeError'> 'module' object has no attribute 'TYPE_CODE_RVALUE_REF':

8 0x000000000041cdc1 in thrust::detail::allocator_traits<std::allocator >::deallocate (a=..., p=0x2bf67b0,

n=10000) at /usr/include/thrust/detail/allocator/allocator_traits.inl:260

Python Exception <type 'exceptions.AttributeError'> 'module' object has no attribute 'TYPE_CODE_RVALUE_REF':

9 0x000000000041b7c4 in thrust::detail::contiguous_storage<float, std::allocator >::deallocate (

this=0x7fffffffd860) at /usr/include/thrust/detail/contiguous_storage.inl:172

Python Exception <type 'exceptions.AttributeError'> 'module' object has no attribute 'TYPE_CODE_RVALUE_REF':

10 0x000000000041a255 in thrust::detail::contiguous_storage<float, std::allocator >::~contiguous_storage (

this=0x7fffffffd860, __in_chrg=<optimized out>) at /usr/include/thrust/detail/contiguous_storage.inl:64

Python Exception <type 'exceptions.AttributeError'> 'module' object has no attribute 'TYPE_CODE_RVALUE_REF':

11 0x0000000000418fb9 in thrust::detail::vector_base<float, std::allocator >::~vector_base (

this=0x7fffffffd860, __in_chrg=<optimized out>) at /usr/include/thrust/detail/vector_base.inl:475

Python Exception <type 'exceptions.AttributeError'> 'module' object has no attribute 'TYPE_CODE_RVALUE_REF':

12 0x0000000000417fe8 in thrust::host_vector<float, std::allocator >::~host_vector (this=0x7fffffffd860, __in_chrg=) at /usr/include/thrust/host_vector.h:52

Python Exception <type 'exceptions.AttributeError'> 'module' object has no attribute 'TYPE_CODE_RVALUE_REF': Python Exception <type 'exceptions.AttributeError'> 'module' object has no attribute 'TYPE_CODE_RVALUE_REF':

13 0x000000000040f49b in main (argc=4, argv=0x7fffffffdf58) at `src/GPUbounce.cu:524

pmadhikar commented 5 years ago

I'm not sure why you are getting so many python exceptions....

Anyway, I don't think anything is wrong with GPUbounce.cu:524. I just cloned a fresh repo of the code and it runs fine for me without any double free errors. Are you sure that you have the latest version? A few months ago, it was pointed out to me that the inp.json on github is out of date. That was causing some double free errors since some variables were not defined in the JSON file. Can you try with the latest version of that too?

Avenger-Bagnarol commented 5 years ago

I downloaded the folder some days ago, inp.json is not the problem either.

However, since the simulator works, I am not going to bother you anymore with this problem. Thank you very much anyway!

Just as a side note, if I have questions on the analysis scripts or the parameters, who should I email to?

pmadhikar commented 5 years ago

Hmm... Well I don't know what the problem could be then. You are most welcome, though I wish we were able to fix the problem. Sometimes such issues can be indicative of other bugs in the code.

I recommend creating a new issue here for other questions. I've also just created a google group to use here: https://groups.google.com/forum/#!forum/cellsim3d

For now, I will be the only one there to help you out :)

zamri93 commented 3 years ago

@Avenger-Bagnarol

The second is that at the vary end of the run, after all the trajectories written, the program in my computer throws an "Error in `./CellDiv': double free or corruption" exception. I cannot figure out if free() is called twice for something or what else.