aws / dcv-color-primitives

DCV Color Primitives Library
MIT No Attribution
31 stars 22 forks source link

Fix chroma unpacking for i420_lrgb avx2 conversion #29

Closed Vendrik closed 4 years ago

Vendrik commented 4 years ago

Fixed chroma unpacking for avx2 instruction set for i420 to lrgb color conversion

nacho commented 4 years ago

Please improve the commit message description

fabiosky commented 4 years ago

This seems now ok considering the function contract: Old:

let x = _mm_loadu_si128(image as *const __m128i);           // rFrErDrCrBrAr9r8r7r6r5r4r3r2r1r0
let xx = _mm256_set_m128i(x, x);                            // rFrErDrCrBrAr9r8r7r6r5r4r3r2r1r0 rFrErDrCrBrAr9r8r7r6r5r4r3r2r1r0
_mm256_unpacklo_epi8(zero!(), xx)                               // r7--r6--r5--r4--r3--r2--r1--r0-- r7--r6--r5--r4--r3--r2--r1--r0--

New:

let x = _mm_loadu_si128(image as *const __m128i);           // rFrErDrCrBrAr9r8r7r6r5r4r3r2r1r0
let xx = _mm256_set_m128i(x, x);                            // rFrErDrCrBrAr9r8r7r6r5r4r3r2r1r0 rFrErDrCrBrAr9r8r7r6r5r4r3r2r1r0
let hi = _mm256_unpackhi_epi8(zero!(), xx);                     // rF--rE--rD--rC--rB--rA--r9--r8-- rF--rE--rD--rC--rB--rA--r9--r8--
let lo = _mm256_unpacklo_epi8(zero!(), xx);                     // r7--r6--r5--r4--r3--r2--r1--r0-- r7--r6--r5--r4--r3--r2--r1--r0--
_mm256_permute2x128_si256(lo, hi, PACK_LO_DQWORD_2X256)         // rF--rE--rD--rC--rB--rA--r9--r8-- r7--r6--r5--r4--r3--r2--r1--r0--