Segmentation Fault within gauge packer routine from Wilson clover from file inversion test

There is a test case in QPhiX which uses the clover operator to invert a gauge configuration read from a file. The relevant code from QPhiX is the following:

multi1d<U> u(4);

QDPIO::cout << "Reading field from file" << filename << endl;
XMLReader file_xml, record_xml;
QDPFileReader from(file_xml, filename, QDPIO_PARALLEL);
read(from, record_xml, u);
close(from);

// ...

QDPIO::cout << "Allocating packged gauge fields" << endl;
Gauge *packed_gauge_cb0 = (Gauge *)geom.allocCBGauge();
Gauge *packed_gauge_cb1 = (Gauge *)geom.allocCBGauge();

GaugeInner *packed_gauge_cb0_i = (GaugeInner *)geom_inner.allocCBGauge();
GaugeInner *packed_gauge_cb1_i = (GaugeInner *)geom_inner.allocCBGauge();

QDPIO::cout << "Fields allocated" << endl;

// Pack the gauge field
QDPIO::cout << "Packing gauge field..." << endl;
qdp_pack_gauge<>(u, packed_gauge_cb0, packed_gauge_cb1, geom);

QDPIO::cout << "Packing inner gauge field..." << endl;
qdp_pack_gauge<>(u, packed_gauge_cb0_i, packed_gauge_cb1_i, geom_inner);

I have build the scalar version of this code on my laptop using GCC. QDP++ has been build with GCC but without any architecture flags (like -x or -march). When I run this test with a gauge configuration, I obtain the following output in GDB:

Fields allocated
Packing gauge field...
Packing inner gauge field...

Thread 2 "t_clov_invert_f" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7ffff6375700 (LWP 2046)]
QPhiX::qdp_pack_gauge<float, 8, 8, true, QDP::multi1d<QDP::OLattice<QDP::PScalar<QDP::PColorMatrix<QDP::RComplex<double>, 3> > > > >
    () at /home/mu/Dokumente/Studium/Master_Science_Physik/Masterarbeit//US_QCD/qphix/include/qphix/qdp_packer_parscalar.h:111
111                                                                             u_cb1[block][2*mu+1][c][c2][0][xx] = u[mu].elem(rb[1].start() + qdpsite).elem().elem(c2,c).real();

There is a segmentation fault in the QDP++ packing routine. It happens for the inner precision part.

In another test case where I have prepared to use my non-degenerate twisted mass operator, I can read the configuration and use the CG solver on it. The only caveat is that it does not converge, but that is a numerical problem (somewhere), but at least the memory accesses are correct.

I have just recompiled QDP++ and QPhiX with the address sanitizer (-fsanitize=address). The output should be quite helpful:

Fields allocated
Packing gauge field...
Packing inner gauge field...
=================================================================
==22218==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x7f30eeaae890 at pc 0x55882dc4462f bp 0x7f30f67fecf0 sp 0x7f30f67fece0
READ of size 8 at 0x7f30eeaae890 thread T1
    #0 0x55882dc4462e in void QPhiX::qdp_pack_gauge<float, 8, 8, true, QDP::multi1d<QDP::OLattice<QDP::PScalar<QDP::PColorMatrix<QDP::RComplex<double>, 3> > > > >(QDP::multi1d<QDP::OLattice<QDP::PScalar<QDP::PColorMatrix<QDP::RComplex<double>, 3> > > > const&, QPhiX::Geometry<float, 8, 8, true>::SU3MatrixBlock*, QPhiX::Geometry<float, 8, 8, true>::SU3MatrixBlock*, QPhiX::Geometry<float, 8, 8, true>&) [clone ._omp_fn.12] /home/mu/Dokumente/Studium/Master_Science_Physik/Masterarbeit//US_QCD/qphix/include/qphix/qdp_packer_parscalar.h:109
    #1 0x7f30f9b8cde5  (/lib64/libgomp.so.1+0x16de5)
    #2 0x7f30f97486c9 in start_thread (/lib64/libpthread.so.0+0x76c9)
    #3 0x7f30f9482f7e in clone (/lib64/libc.so.6+0x107f7e)

0x7f30eeaae890 is located 128 bytes to the right of 1179664-byte region [0x7f30ee98e800,0x7f30eeaae810)
allocated by thread T0 here:
    #0 0x7f30fac7e040 in operator new[](unsigned long) (/lib64/libasan.so.3+0xc8040)
    #1 0x55882dd3ce9d in QDP::Allocator::QDPDefaultAllocator::allocate(unsigned long, QDP::Allocator::MemoryPoolHint const&) (/home/mu/Build/qphix-debug/tests/t_clov_invert_from_file+0x14ce9d)
    #2 0x55882dc3c7a4 in testClovInvertFromFile::run() /home/mu/Dokumente/Studium/Master_Science_Physik/Masterarbeit//US_QCD/qphix/tests/testClovInvertFromFile.cc:536
    #3 0x60400000dc8f  (<unknown module>)

Thread T1 created by T0 here:
    #0 0x7f30fabe7488 in __interceptor_pthread_create (/lib64/libasan.so.3+0x31488)
    #1 0x7f30f9b8d39f  (/lib64/libgomp.so.1+0x1739f)
    #2 0x7f30f9b84199 in GOMP_parallel (/lib64/libgomp.so.1+0xe199)

SUMMARY: AddressSanitizer: heap-buffer-overflow /home/mu/Dokumente/Studium/Master_Science_Physik/Masterarbeit//US_QCD/qphix/include/qphix/qdp_packer_parscalar.h:109 in void QPhiX::qdp_pack_gauge<float, 8, 8, true, QDP::multi1d<QDP::OLattice<QDP::PScalar<QDP::PColorMatrix<QDP::RComplex<double>, 3> > > > >(QDP::multi1d<QDP::OLattice<QDP::PScalar<QDP::PColorMatrix<QDP::RComplex<double>, 3> > > > const&, QPhiX::Geometry<float, 8, 8, true>::SU3MatrixBlock*, QPhiX::Geometry<float, 8, 8, true>::SU3MatrixBlock*, QPhiX::Geometry<float, 8, 8, true>&) [clone ._omp_fn.12]
Shadow bytes around the buggy address:
  0x0fe69dd4dcc0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0fe69dd4dcd0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0fe69dd4dce0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0fe69dd4dcf0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0fe69dd4dd00: 00 00 fa fa fa fa fa fa fa fa fa fa fa fa fa fa
=>0x0fe69dd4dd10: fa fa[fa]fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0fe69dd4dd20: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0fe69dd4dd30: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0fe69dd4dd40: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0fe69dd4dd50: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0fe69dd4dd60: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07 
  Heap left redzone:       fa
  Heap right redzone:      fb
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack partial redzone:   f4
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
==22218==ABORTING

The QPhiX test code for the Wilson clover case has not been altered by myself, so I thought that it would just work out of the box. Does anyone have similar experiences with this test case?

JeffersonLab / qphix

Segmentation Fault within gauge packer routine from Wilson clover from file inversion test #28