Closed HolyWu closed 4 years ago
Is the clip 8 bit or 16 bit?
Does the same happen with v20?
Both 8 and 16 bit have the artifacts. I just tried v20 and it works fine.
BTW, the pixels at right borders are also different.
Byte order remains difficult to figure out.
I think these two variables are initialised backwards: https://github.com/dubhater/vapoursynth-mvtools/blob/master/src/SimpleResize_AVX2.cpp#L150-L181
If the first parameter to _mm256_set_epi16 sets the least significant word, and the first word in the 16 words loaded by _mm256_loadu_si256 ends up in the least significant position, then those two above definitely backwards. The first column of the "image" needs a minimum of 0 and maximum of simple->limit_width - 0) * pel - 1
.
What do you think?
I think the first parameter to _mm256_set_epi16
actually sets the most significant word, while that to _mm256_setr_epi16
sets the least significant word.
Here using 128-bit vector for simplicity.
#include <cstdio>
#include <emmintrin.h>
int main(int argc, char ** argv) {
const __m128i vec = _mm_set_epi16(0x01, 0x23, 0x45, 0x67, 0x89, 0xab, 0xcd, 0xef);
short ret1[8];
_mm_storeu_si128((__m128i *)ret1, vec);
for (int i = 0; i < 8; i++)
printf("%#.4x\n", ret1[i]);
printf("\n");
int * ret2 = (int *)ret1;
for (int i = 0; i < 4; i++)
printf("%#.8x\n", ret2[i]);
printf("\n");
const short foo[8] = { 0x0, 0x1, 0x2, 0x3, 0x4, 0x5, 0x6, 0x7 };
const __m128i bar = _mm_loadu_si128((const __m128i *)foo);
for (int i = 0; i < 4; i++)
printf("%#.8x\n", bar.m128i_i32[i]);
return 0;
}
Output:
0x00ef
0x00cd
0x00ab
0x0089
0x0067
0x0045
0x0023
0x0001
0x00cd00ef
0x008900ab
0x00450067
0x00010023
0x00010000
0x00030002
0x00050004
0x00070006
Hmm, then those two variables are not the problem.
I should mention that I don't have AVX2, so I can only read the code.
The initial_horizontal_minimum
variable was indeed the problem. It was initialized with wrong values. The values should be negative (0, -1, -2, -3, etc.) rather than positive. Fixing that eliminates the artifacts at left borders, but the pixel differences at right borders still persist. No clue what's the culprit at the moment...
Oh, yeah, they're supposed to be negative.
In today's episode of "What was I thinking when I wrote that code??"...
Does the problem at the right border go away with this patch? I had a look in gdb and things seem to be correct now, but more testing can't hurt.
diff --git a/src/SimpleResize_AVX2.cpp b/src/SimpleResize_AVX2.cpp
index 39cb01f..86fd295 100644
--- a/src/SimpleResize_AVX2.cpp
+++ b/src/SimpleResize_AVX2.cpp
@@ -121,12 +121,14 @@ static FORCE_INLINE void simpleResize_int16_t_horizontal_8px_avx2(const SimpleRe
pixels = _mm256_madd_epi16(pixels, dwords_weights_h);
pixels = _mm256_add_epi32(pixels, _mm256_set1_epi32(simple_resize_weight_half));
pixels = _mm256_srai_epi32(pixels, simple_resize_weight_shift);
+
+ pixels = _mm256_max_epi32(minimum,
+ _mm256_min_epi32(pixels, maximum));
+
pixels = _mm256_packs_epi32(pixels, pixels);
- pixels = _mm256_max_epi16(minimum,
- _mm256_min_epi16(pixels, maximum));
- minimum = _mm256_sub_epi16(minimum, horizontal_step);
- maximum = _mm256_sub_epi16(maximum, horizontal_step);
+ minimum = _mm256_sub_epi32(minimum, horizontal_step);
+ maximum = _mm256_sub_epi32(maximum, horizontal_step);
_mm_storeu_si128((__m128i *)&dstp[x], _mm256_castsi256_si128(_mm256_permute4x64_epi64(pixels, 0xe8))); // 0b11101000
}
@@ -143,19 +145,11 @@ void simpleResize_int16_t_avx2(const SimpleResize *simple,
int pel = simple->pel;
__m256i minimum = _mm256_setzero_si256();
- __m256i maximum = _mm256_set1_epi16(simple->limit_height * pel - 1);
- __m256i horizontal_step = _mm256_set1_epi16(horizontal_vectors ? pel * pixels_per_iteration : 0);
- __m256i vertical_step = _mm256_set1_epi16(horizontal_vectors ? 0 : pel);
-
- __m256i initial_horizontal_minimum = _mm256_set_epi16(-15 * pel,
- -14 * pel,
- -13 * pel,
- -12 * pel,
- -11 * pel,
- -10 * pel,
- -9 * pel,
- -8 * pel,
- -7 * pel,
+ __m256i maximum = _mm256_set1_epi32(simple->limit_height * pel - 1);
+ __m256i horizontal_step = _mm256_set1_epi32(horizontal_vectors ? pel * pixels_per_iteration : 0);
+ __m256i vertical_step = _mm256_set1_epi32(horizontal_vectors ? 0 : pel);
+
+ __m256i initial_horizontal_minimum = _mm256_set_epi32(-7 * pel,
-6 * pel,
-5 * pel,
-4 * pel,
@@ -163,15 +157,7 @@ void simpleResize_int16_t_avx2(const SimpleResize *simple,
-2 * pel,
-1 * pel,
0 * pel);
- __m256i initial_horizontal_maximum = _mm256_set_epi16((simple->limit_width - 15) * pel - 1,
- (simple->limit_width - 14) * pel - 1,
- (simple->limit_width - 13) * pel - 1,
- (simple->limit_width - 12) * pel - 1,
- (simple->limit_width - 11) * pel - 1,
- (simple->limit_width - 10) * pel - 1,
- (simple->limit_width - 9) * pel - 1,
- (simple->limit_width - 8) * pel - 1,
- (simple->limit_width - 7) * pel - 1,
+ __m256i initial_horizontal_maximum = _mm256_set_epi32((simple->limit_width - 7) * pel - 1,
(simple->limit_width - 6) * pel - 1,
(simple->limit_width - 5) * pel - 1,
(simple->limit_width - 4) * pel - 1,
@@ -212,16 +198,16 @@ void simpleResize_int16_t_avx2(const SimpleResize *simple,
simpleResize_int16_t_horizontal_8px_avx2(simple, dstp, workp, x, minimum, maximum, horizontal_step);
if (dst_width_avx2 < simple->dst_width) {
- __m256i step_back = _mm256_set1_epi16((pixels_per_iteration - (simple->dst_width - dst_width_avx2)) * pel);
- minimum = _mm256_add_epi16(minimum, step_back);
- maximum = _mm256_add_epi16(maximum, step_back);
+ __m256i step_back = _mm256_set1_epi32((pixels_per_iteration - (simple->dst_width - dst_width_avx2)) * pel);
+ minimum = _mm256_add_epi32(minimum, step_back);
+ maximum = _mm256_add_epi32(maximum, step_back);
simpleResize_int16_t_horizontal_8px_avx2(simple, dstp, workp, simple->dst_width - pixels_per_iteration, minimum, maximum, horizontal_step);
}
dstp += dst_stride;
- minimum = _mm256_sub_epi16(minimum, vertical_step);
- maximum = _mm256_sub_epi16(maximum, vertical_step);
+ minimum = _mm256_sub_epi32(minimum, vertical_step);
+ maximum = _mm256_sub_epi32(maximum, vertical_step);
}
free(workp);
Something weird here. The right border issue is not reproducible now after sekrit-twc's AVX2 optimizations being merged, no matter your patch is applied or not.
The meaning of the opt
parameter changed. There are three possible values now:
False evaluates to 0. True evaluates to 1.
Okay, I found the culprit. Now the opt
parameter is intended to be used as integer rather than boolean, but all the initialization of d.opt still use !! which perfectly turn all integers into 0 and 1. After fixing that, your patch does prove to work.
Thanks for testing. I'll fix that too, soon.
Could we have a new release with the AVX2 fixes, please? I think many of us MVTools users would appreciate the increase in performance :)
It might happen faster if you keep asking about it every few days. It's very easy to forget about it.
I'm lousy at harassing people, especially if they don't get paid for the work. That's why I never made into a good project manager, I'm much more like a project mangler I think..
I'll try to remind you at some point now that you've given me permission to do so.
Fixed in v22, I guess.
opt=False
opt=True