dmtcp / dmtcp

DMTCP: Distributed MultiThreaded CheckPointing
http://dmtcp.sourceforge.net/
Other
375 stars 133 forks source link

Memory corruption with multi-threaded process #15

Closed karya0 closed 9 years ago

karya0 commented 9 years ago

I noticed a memory corruption related segfault on my laptop (2 cores, 4 threads) that is not too hard to reproduce. It doesn't even involve creating a checkpoint image.

dmtcp_launch ./test/pthread4

Let it run for a while and eventually, you will see a segmentation fault.

To look closely, you can define env var DMTCP_SEGFAULT_HANDLER=1 and instead of creating a core file it will loop inside the segfault handler function.

The next step should be to disable JAlloc by removing #define from jalloc.h. I couldn't use libc malloc on my laptop due to a separate issue involving isspace() (I'll create a separate issue about that).

gc00 commented 9 years ago

Wow! That's a really important find. I was hoping that DMTCP was starting to become really solid (at least the core non-plugin part).

Thanks, Kapil.

On Mon, Nov 03, 2014 at 10:29:16AM -0800, Kapil Arya wrote:

I noticed a memory corruption related segfault on my laptop (2 cores, 4 threads) that is not too hard to reproduce. It doesn't even involve creating a checkpoint image.

dmtcp_launch ./test/pthread4

Let it run for a while and eventually, you will see a segmentation fault.

To look closely, you can define env var DMTCP_SEGFAULT_HANDLER=1 and instead of creating a core file it will loop inside the segfault handler function.

The next step should be to disable JAlloc by removing #define from jalloc.h. I couldn't use libc malloc on my laptop due to a separate issue involving isspace() (I'll create a separate issue about that).


Reply to this email directly or view it on GitHub: https://github.com/dmtcp/dmtcp/issues/15

rohgarg commented 9 years ago

Did you notice this on other systems as well? I left it running on dekaksi but couldn't reproduce this issue.

karya0 commented 9 years ago

It was a race in Jalloc. Fixed in 2487ccdd68dc5f508635346237a484e300bc7b39.