for (int x = 2; x < nWidth - 4; x += 8) { ... _mm_storel_epi64((__m128i *)&pDst[x], m0); }
Since _mm_storel_epi64() writes 8 bytes, this overwrites pDst[nWidth], pDst[nWidth + 1], and pDst[nWidth + 2], which may overwrite VS_FRAME_GUARD_PATTERN which is stored before the beginning and after the end of the pixel buffer. This write is caught by VSFrame::verifyGuardPattern() in the src/core/vscore.cpp in the VapourSynth R44 sources and makes VSNode::getFrameInternal() in the same file SIGABRT.
The following is the backtrace when gdb catches (via “watch”) when someone writes into the protected memory:
at /var/tmp/portage/media-plugins/vapoursynth-mvtools-20/work/vapoursynth-mvtools-20/src/MVSuper.c:110
6 0x0000155551f22a3e in VSNode::getFrameInternal (this=..., n=..., activationReason=..., frameCtx=...) at src/core/vscore.cpp:849
7 0x0000155551f3b7b7 in VSThreadPool::runTasks (owner=..., stop=...) at src/core/vsthreadpool.cpp:186
8 0x0000155554e0026f in std::execute_native_thread_routine (__p=...)
at /var/tmp/portage/sys-devel/gcc-7.3.0-r3/work/gcc-7.3.0/libstdc++-v3/src/c++11/thread.cc:83
9 0x0000155553d01a15 in start_thread (arg=...) at pthread_create.c:465
10 0x000015555484568f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
The above is from a commercial DVD, so I cannot easily post a video source. I’m in fact surprised that I only encounter his bug now since it seems to have been present since at least Oct 2016.
I’m not suggesting a patch because the SSE2 trickery is beyond me (like, six loads are more efficient than one load and five shifts and masks?).
HorizontalWiener_sse2() contains this snippet:
for (int x = 2; x < nWidth - 4; x += 8) { ... _mm_storel_epi64((__m128i *)&pDst[x], m0); }
Since _mm_storel_epi64() writes 8 bytes, this overwrites
pDst[nWidth]
,pDst[nWidth + 1]
, andpDst[nWidth + 2]
, which may overwrite VS_FRAME_GUARD_PATTERN which is stored before the beginning and after the end of the pixel buffer. This write is caught by VSFrame::verifyGuardPattern() in the src/core/vscore.cpp in the VapourSynth R44 sources and makes VSNode::getFrameInternal() in the same file SIGABRT.The following is the backtrace when gdb catches (via “watch”) when someone writes into the protected memory:
0 HorizontalWiener_sse2 (pDst=..., pSrc=..., nPitch=736, nWidth=736, nHeight=592, bitsPerSample=...)
1 0x00001554c1291365 in mvpRefine (mvp=..., sharp=...)
2 0x00001554c129297d in mvpRefine (sharp=..., mvp=...)
3 mvfRefine (mvf=..., nMode=..., sharp=...)
4 0x00001554c1292e71 in mvgofRefine (mvgof=..., nMode=..., sharp=...)
5 0x00001554c1296f14 in mvsuperGetFrame (n=..., activationReason=..., instanceData=..., frameData=..., frameCtx=..., core=..., vsapi=...)
6 0x0000155551f22a3e in VSNode::getFrameInternal (this=..., n=..., activationReason=..., frameCtx=...) at src/core/vscore.cpp:849
7 0x0000155551f3b7b7 in VSThreadPool::runTasks (owner=..., stop=...) at src/core/vsthreadpool.cpp:186
8 0x0000155554e0026f in std::execute_native_thread_routine (__p=...)
9 0x0000155553d01a15 in start_thread (arg=...) at pthread_create.c:465
10 0x000015555484568f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
The above is from a commercial DVD, so I cannot easily post a video source. I’m in fact surprised that I only encounter his bug now since it seems to have been present since at least Oct 2016.
I’m not suggesting a patch because the SSE2 trickery is beyond me (like, six loads are more efficient than one load and five shifts and masks?).