Segmentation fault in Filecache::loadMaps

j-stephan commented 5 years ago

I am currently trying to get my SYCL port to run. Unfortunately it crashes with a segmentation fault right at the first call site of FileCache::loadMaps(). At first I believed this to be an issue in my SYCL port but it turns out that this happens with vanilla Alpaka (current develop) and OpenMP, too.

gdb output:

(gdb) run
Starting program: /home/jan/workspace/jungfrau-data/tier0/tier1/tier2/jungfrau-photoncounter 
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
/home/jan/workspace/jungfrau-photoncounter/include/jungfrau-photoncounter/Config.hpp[208]:
    0 ms
    filecache created 

Program received signal SIGSEGV, Segmentation fault.
__memmove_avx_unaligned_erms () at ../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:436
436 ../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S: Datei oder Verzeichnis nicht gefunden.
(gdb) bt
#0  __memmove_avx_unaligned_erms () at ../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:436
#1  0x000055555557a84b in alpaka::mem::view::cpu::detail::TaskCopyCpu<std::integral_constant<unsigned long, 1ul>, alpaka::mem::buf::BufCpu<DetectorConfig<1000ul, 1000ul, 999ul, 1024ul, 512ul, 10ul, 100ul, 100ul, 3ul, 5ul>::Frame<unsigned short>, std::integral_constant<unsigned long, 1ul>, unsigned long>, alpaka::mem::view::ViewPlainPtr<alpaka::dev::DevCpu, DetectorConfig<1000ul, 1000ul, 999ul, 1024ul, 512ul, 10ul, 100ul, 100ul, 3ul, 5ul>::Frame<unsigned short>, std::integral_constant<unsigned long, 1ul>, unsigned long>, unsigned long>::operator() (this=0x7fffffffd6e0)
    at /home/jan/workspace/alpaka/include/alpaka/mem/buf/cpu/Copy.hpp:209
#2  0x0000555555575884 in alpaka::queue::traits::Enqueue<alpaka::queue::QueueCpuBlocking, alpaka::mem::view::cpu::detail::TaskCopyCpu<std::integral_constant<unsigned long, 1ul>, alpaka::mem::buf::BufCpu<DetectorConfig<1000ul, 1000ul, 999ul, 1024ul, 512ul, 10ul, 100ul, 100ul, 3ul, 5ul>::Frame<unsigned short>, std::integral_constant<unsigned long, 1ul>, unsigned long>, alpaka::mem::view::ViewPlainPtr<alpaka::dev::DevCpu, DetectorConfig<1000ul, 1000ul, 999ul, 1024ul, 512ul, 10ul, 100ul, 100ul, 3ul, 5ul>::Frame<unsigned short>, std::integral_constant<unsigned long, 1ul>, unsigned long>, unsigned long>, void>::enqueue (queue=..., task=...) at /home/jan/workspace/alpaka/include/alpaka/queue/QueueCpuBlocking.hpp:174
#3  0x00005555555712e3 in alpaka::queue::enqueue<alpaka::queue::QueueCpuBlocking, alpaka::mem::view::cpu::detail::TaskCopyCpu<std::integral_constant<unsigned long, 1ul>, alpaka::mem::buf::BufCpu<DetectorConfig<1000ul, 1000ul, 999ul, 1024ul, 512ul, 10ul, 100ul, 100ul, 3ul, 5ul>::Frame<unsigned short>, std::integral_constant<unsigned long, 1ul>, unsigned long>, alpaka::mem::view::ViewPlainPtr<alpaka::dev::DevCpu, DetectorConfig<1000ul, 1000ul, 999ul, 1024ul, 512ul, 10ul, 100ul, 100ul, 3ul, 5ul>::Frame<unsigned short>, std::integral_constant<unsigned long, 1ul>, unsigned long>, unsigned long> > (queue=..., 
    task=...) at /home/jan/workspace/alpaka/include/alpaka/queue/Traits.hpp:62
#4  0x000055555556b802 in alpaka::mem::view::copy<unsigned long, alpaka::mem::view::ViewPlainPtr<alpaka::dev::DevCpu, DetectorConfig<1000ul, 1000ul, 999ul, 1024ul, 512ul, 10ul, 100ul, 100ul, 3ul, 5ul>::Frame<unsigned short>, std::integral_constant<unsigned long, 1ul>, unsigned long>, alpaka::mem::buf::BufCpu<DetectorConfig<1000ul, 1000ul, 999ul, 1024ul, 512ul, 10ul, 100ul, 100ul, 3ul, 5ul>::Frame<unsigned short>, std::integral_constant<unsigned long, 1ul>, unsigned long>, alpaka::queue::QueueCpuBlocking> (queue=..., viewDst=..., viewSrc=..., extent=@0x7fffffffd7b8: 2999)
    at /home/jan/workspace/alpaka/include/alpaka/mem/view/Traits.hpp:399
#5  0x000055555555c812 in alpakaCopy<alpaka::queue::QueueCpuBlocking&, alpaka::mem::buf::BufCpu<DetectorConfig<1000, 1000, 999, 1024, 512, 10, 100, 100, 3, 5>::Frame<unsigned short>, std::integral_constant<unsigned long, 1>, unsigned long>&, alpaka::mem::view::ViewPlainPtr<alpaka::dev::DevCpu, DetectorConfig<1000, 1000, 999, 1024, 512, 10, 100, 100, 3, 5>::Frame<unsigned short>, std::integral_constant<unsigned long, 1>, unsigned long>, unsigned long&> (args#0=..., args#1=..., args#2=..., args#3=@0x7fffffffd7b8: 2999)
    at /home/jan/workspace/jungfrau-photoncounter/include/jungfrau-photoncounter/kernel/../AlpakaHelper.hpp:40
#6  0x0000555555561686 in Filecache<DetectorConfig<1000ul, 1000ul, 999ul, 1024ul, 512ul, 10ul, 100ul, 100ul, 3ul, 5ul> >::loadMaps<DetectorConfig<1000ul, 1000ul, 999ul, 1024ul, 512ul, 10ul, 100ul, 100ul, 3ul, 5ul>::Frame<unsigned short>, CpuOmp4<524288ul> > (this=0x5555555fae90, path="../../../data_pool/px_101016/allpede_250us_1243__B_000000.dat", header=true)
    at /home/jan/workspace/jungfrau-photoncounter/include/Filecache.hpp:67
#7  0x000055555555b9ed in main (argc=1, argv=0x7fffffffdef8) at main.cpp:34

Compiler invocation:

g++ -std=c++2a -g -I/home/jan/workspace/alpaka/include -I/home/jan/workspace/jungfrau-photoncounter/include -I/home/jan/workspace/jungfrau-photoncounter/include/jungfrau-photoncounter -DALPAKA_ACC_CPU_BT_OMP4_ENABLED -DALPAKA_ACC_CPU_B_SEQ_T_SEQ_ENABLED main.cpp -ftemplate-backtrace-limit=0 -o jungfrau-photoncounter -fopenmp -pthread -DVERBOSE

The changes I've made to the jungfrau-photoncounter code are SYCL related and should only affect the device code (all changes here), so I believe this to be a bug in vanilla jungfrau-photoncounter.

kloppstock commented 5 years ago

Unfortunately I was not able to reproduce this error. I used the current develop Version of alpaka, your supplied code and GCC 9.1.0 on the hemera GPU partition. As an accelerator device I used CpuOmp4.

Are the input files located at the correct locations? What kind of input do you use?

j-stephan commented 5 years ago

Okay, I figured this one out. On my laptop the initial call to malloc inside Filecache fails as I don't have 16GiB of RAM. In this case, malloc returns a nullptr. This case isn't checked in Filecache and the resulting nullptr is then happily passed on, resulting in the segmentation fault later on.

This makes me wonder: Do we need the entire 16GiB at once? Or could a streaming dataflow be implemented?

kloppstock commented 5 years ago

Thanks for this information! We will check for nullptrs in future version to prevent this from happening again.

We decided to load the data upfront because it would probably bottleneck the algorithm later. It is later intended for this algorithm to get data directly from the detector via network. Therefore, streaming the data from disk might affect benchmarks unfairly and negatively which is why we decided to load the data up front. This means loading the gain maps (~12MiB), the pedestal initialization data (~3GiB) and the main data (~10 GiB).

If you want to execute the algorithm with less main memory required, you could reduce the main data set and adjust the FileCache constructor call in the main.cpp on line 23 accordingly:

One frame of the data set consists of 1024 512 16 bit values and has a 16 bit header, which makes a total size of 1,048,592 bytes per frame. The main data set consists of 10,000 frames (hence the ~10Gib) and can be split up into smaller chunks of complete frames.
The FileCache constructor the size in bytes of the cache it will allocate.

So if you would reduce the main data set to 1,000 frames and adjust the FileCache constructor call to 1024UL * 1024 * 1024 * 4.5, the program should consume (depending on other parameters like TDevFrames in the Configstruct and other allocations in the main.cpp) less than 6-7 GiB.

Alternatively, streaming from disk could be implemented by loading small chunks on demand and feeding them into the Dispenser::uploadData() function. The returned std::future will be accessible as soon as the data processing is done and the buffer can be freed or reused.

ComputationalRadiationPhysics / jungfrau-photoncounter

Segmentation fault in Filecache::loadMaps #60