JeffersonLab / qphix

QCD for Intel Xeon Phi and Xeon processors
http://jeffersonlab.github.io/qphix/
Other
13 stars 11 forks source link

Refactor BLAS functor streaming code into a class #79

Open martin-ueding opened 7 years ago

martin-ueding commented 7 years ago

The BLAS functors contain a lot of repeated boilerplate code. This should be refactored into a resource wrapper class.

// Temporary storage to stream into and out of
#if defined(__GNUG__) && !defined(__INTEL_COMPILER)
    typename Geometry<AT, V, S, compress>::FourSpinorBlock x_spinor
        __attribute__((aligned(QPHIX_LLC_CACHE_ALIGN)));
    typename Geometry<AT, V, S, compress>::FourSpinorBlock y_spinor
        __attribute__((aligned(QPHIX_LLC_CACHE_ALIGN)));
#else
    __declspec(align(QPHIX_LLC_CACHE_ALIGN))
        typename Geometry<AT, V, S, compress>::FourSpinorBlock x_spinor;
    __declspec(align(QPHIX_LLC_CACHE_ALIGN))
        typename Geometry<AT, V, S, compress>::FourSpinorBlock y_spinor;
#endif

    BLASUtils::streamInSpinor<FT, V>((AT *)x_spinor, xbase, nvec_in_spinor);

    // Now we are hopefully both in L1 and in the right layout so
    // ... Actual Code ...

    BLASUtils::streamOutSpinor<FT, V>(ybase, (const AT *)y_spinor, nvec_in_spinor);
martin-ueding commented 7 years ago

Concerns were raised that the proposed solution with the class StreamSpinor could degrade the performance because the alignment of member variables might be adjusted to something that we do not want. Also the get() member function and L1 cache locality seems unclear.

Performance measurements should be done before this is implemented in devel.