Closed Lokathor closed 4 years ago
Also, the permute operations should maybe be looked at and all clarified as part of this as well.
swiz_{inputs}_{lane-size}_{source}_{data-type}
a
, b
, and/or i
f32
, f64
, iX
, and possibly with a z
at the end if you can zero any lane source.So here's each C intrinsic and the name it would have under this scheme:
C Name | safe_arch Name |
---|---|
_mm_permutevar_ps | swiz_ab_f32_all_m128 |
_mm256_permutevar_ps | swiz_ab_f32_half_m256 |
_mm_permutevar_pd | swiz_ab_f64_all_m128d |
_mm256_permutevar_pd | swiz_ab_f64_half_m256d |
_mm256_permutevar8x32_ps | swiz_ab_i32_all_m256 |
_mm256_permutevar8x32_epi32 | swiz_ab_i32_all_m256i |
_mm_shuffle_epi8 | swiz_ab_i8_all_m128i |
_mm256_shuffle_epi8 | swiz_ab_i8_half_m256i |
_mm256_permute2f128_ps | swiz_abi_f128z_all_m256 |
_mm256_permute2f128_pd | swiz_abi_f128z_all_m256d |
_mm256_permute2f128_si256 | swiz_abi_f128z_all_m256i |
_mm256_permute2x128_si256 | swiz_abi_i128z_all_m256i |
_mm_shuffle_ps | swiz_abi_f32_all_m128 |
_mm256_shuffle_ps | swiz_abi_f32_half_m256 |
_mm_shuffle_pd | swiz_abi_f64_all_m128d |
_mm256_shuffle_pd | swiz_abi_f64_half_m256d |
_mm_permute_ps | swiz_ai_f32_all_m128 |
_mm_shuffle_epi32 | swiz_ai_f32_all_m128i |
_mm256_permute_ps | swiz_ai_f32_half_m256 |
_mm_permute_pd | swiz_ai_f64_all_m128d |
_mm256_permute4x64_pd | swiz_ai_f64_all_m256d |
_mm256_permute_pd | swiz_ai_f64_half_m256d |
_mm_shufflehi_epi16 | swiz_ai_i16_h64all_m128i |
_mm256_shufflehi_epi16 | swiz_ai_i16_h64half_m256i |
_mm_shufflelo_epi16 | swiz_ai_i16_l64all_m128i |
_mm256_shufflelo_epi16 | swiz_ai_i16_l64half_m256i |
_mm256_shuffle_epi32 | swiz_ai_i32_half_m256i |
_mm256_permute4x64_epi64 | swiz_ai_i64_all_m256i |
Oh, also, we're using swiz
for "swizzle" because sometimes Intel calls it "shuffle" and sometimes it calls it "permute" and there's seemingly no logic to why one or the other is used for each particular op/intrinsic:
So we'll simply forsake both names and then pick a third name that doesn't have any existing baggage.
Oh dag we also have to consider that some b
values are the varying pattern and some b
values are the 2nd register to mix in.
Update: It's a mess!