Closed as-com closed 4 years ago
Reproduced. Thanks for filing this!
I'm noticing that it also only happens when Cilk Plus is used. If you switch to anything else (including TBB), the issue goes away. But I still do not know the root cause.
I'm noticing it happening with Push Pool and multithreading disabled. You may need to tweak around the RAID configuration since it's... that kind of bug.
Root cause found. The Linux "NodeInterleaving" allocators were allocating the wrong amount of memory. Thus - buffer overrun and data corruption.
The code is correct in Windows where it was originally written in. But when ported to Linux, some bad copy-pasting led to the wrong size being used for the call to mmap()
. This incorrect size doesn't take into account for the increased alignment that's required by the I/O buffers.
This slipped through testing because I don't have a dual-socket machine and I generally don't test Linux as heavily as Windows.
One line change. This will be fixed in the next patch. Thank you very much for reporting this!
EDIT: Only the new Disk Raid 0 requires greater than page alignment. The legacy Raid0+3 does not. Therefore this buffer overrun doesn't happen with the old code.
Closing as this is fixed in the latest release
Steps to reproduce:
The resulting run may have the following behaviors:
I have reproduced this issue with y-cruncher v0.7.8 on two different dual socket Westmere systems running different Linux distributions and kernel versions. This issue does not occur when the mmap allocator is used for I/O buffers,