dthuerck / mapmap_cpu

A high-performance general-purpose MRF MAP solver, heavily exploiting SIMD instructions.
BSD 3-Clause "New" or "Revised" License
102 stars 51 forks source link

Removed AVX instruction `_mm_maskstore_ps` from SSE wrapper #8

Closed magcks closed 6 years ago

magcks commented 6 years ago

I replaced the _mm_maskstore_ps intrinsic from all SSE masked_store functions with a combination of two _mm_maskmove_si64.

The code can be tested with the following 'tests':

void test_float()
{
    float in[] = { 1, 2, 3, 4 };
    uint32_t masks[] = { 0xffffffff, 0xffffffff, 0, 0xffffffff };
    char data[2048];
    int off = 0;
    _v_t<float, 4> a = v_load<float, 4>(in);
    _iv_t<float, 4> mask = iv_load<float, 4>((const int32_t*)masks);
    _s_t<float, 4> *ptr = (float*)(data + off);

    v_masked_store<float, 4>(a, mask, ptr);

    float *read = (float*)(data + off);
    std::cout << read[0] << " " << read[1] << " " << read[2] << " " << read[3] << std::endl;
}
void test_double()
{
    double in[] = { 13, 37 };
    uint64_t masks[] = { 0xffffffffffffffffull, 0xffffffffffffffffull };
    char data[2048];
    int off = 0;
    _v_t<double, 2> a = v_load<double, 2>(in);
    _iv_t<double, 2> mask = iv_load<double, 2>((const int64_t*)masks);
    _s_t<double, 2> *ptr = (double*)(data + off);

    v_masked_store<double, 2>(a, mask, ptr);

    double *read = (double*)(data + off);
    std::cout << read[0] << " " << read[1] << std::endl;
}
dthuerck commented 6 years ago

Kudos, Max!