ermig1979 / Simd

C++ image processing and machine learning library with using of SIMD: SSE, AVX, AVX-512, AMX for x86/x64, VMX(Altivec) and VSX(Power7) for PowerPC, NEON for ARM.
http://ermig1979.github.io/Simd
MIT License
2.04k stars 407 forks source link

Crash in SimdStore.h #141

Closed edward9112 closed 3 years ago

edward9112 commented 3 years ago

Hey there!

I have been testing the lib on Windows 7 32 bit, intel i3-4170 and found out that the library crashes in SimdStore.h I could prevent the crash by commenting out the following lines:

template <> SIMD_INLINE void Store<false>(__m256i * p, __m256i a)
{
    //_mm256_storeu_si256(p, a);
}

template <> SIMD_INLINE void Store<true>(__m256i * p, __m256i a)
{
    //_mm256_store_si256(p, a);
}

The problem disappears only when I drop the SSE and AVX branches and use only Base non optimized branch.

ermig1979 commented 3 years ago

Hello! The function Store is very widely used (almost in every algorithm) so I need more information (call stack) to localize the bug.

edward9112 commented 3 years ago

SimdAvx2ReduceGray3x3.cpp > ReduceGray3x3() Line 96

ermig1979 commented 3 years ago

1) I can't reproduce crash with using of Test application. Could you tell me sizes of input and output images? 2) Current version of SimdAvx2ReduceGray3x3.cpp at line 96 doesn't contain anything which can give crash.

edward9112 commented 3 years ago

I don't have a direct access to debug the app on that machine. I can only try to add logging and then check what's going on there. The logging says that it fails to enter that cycle (on line 96), that's it.

The input size does not matter because it crashes on any input provided.

ermig1979 commented 3 years ago

I ran Simda's tests for GrayReduce3x3. They were successful. Your processor ( i3-4170) supports AVX2. So I thing that the error in the wrong size of input and output images. I note that:

    int dstWidth = (srcWidth + 1) / 2;
    int dstHeight = (srcHeight + 1) / 2;

Function GrayReduce3x3 reduce image in two times. Otherwise you can use function Resize.

edward9112 commented 3 years ago

But it works perfectly on other machine with different version of windows.

What could be wrong with the code when it fails to enter this cycle: for (size_t row = 0; row < srcHeight; row += 2, dst += dstStride, src += 2 * srcStride)

Could it be because of deficient windows installation?

ermig1979 commented 3 years ago

There is nothing to cause crash at this row. If this is 'Release' build configuration that debuger can show wrong line info. The main causes of crash: 1) Access to memory out of array. 2) Use of illegal instruction. 3) Aligned reading/writing from/to unaligned memory address.