ermig1979 / Simd

C++ image processing and machine learning library with using of SIMD: SSE, AVX, AVX-512, AMX for x86/x64, VMX(Altivec) and VSX(Power7) for PowerPC, NEON for ARM.
http://ermig1979.github.io/Simd
MIT License
2.04k stars 407 forks source link

Test suite failed on legacy x86 system #274

Open theoractice opened 1 month ago

theoractice commented 1 month ago

I am testing Win32 static build, and according to AIDA64 the CPU supports SSE4.1. image

Sobel tests passed:

D:\Simd\bin\v143>Test.exe -m=a -tt=1 -fi=Sobel -ot=log.txt

Test progress: 100.0%

[000] Info: ALL TESTS ARE FINISHED SUCCESSFULLY!

[000] Info:

Simd Library Performance Report:

Execution time: 2024.08.03 19:25:25. Test threads: 1. Simd version: 6.1.139. CPU
: Intel(R) Atom(TM) CPU  Z3735F @ 1.33GHz.
Sockets: 1, Cores: 4, Threads: 4; Cache L1D: 24 KB, L2: 1024 KB, L3:  MB, RAM: 1
.9 GB; SIMD: SSE4.1 SSSE3 SSE3 SSE2 SSE.

------------------------------------------------------
| Function      |   API   Base Sse41 | Bs/S4 | Bs/S4 |
------------------------------------------------------
| Common, ms    | 3.316 20.750 3.418 |  6.07 |  6.07 |
------------------------------------------------------
| SobelDx       | 3.264 22.562 3.090 |  7.30 |  7.30 |
| SobelDxAbs    | 2.978 23.387 3.227 |  7.25 |  7.25 |
| SobelDxAbsSum | 4.181 18.109 3.044 |  5.95 |  5.95 |
| SobelDy       | 3.170 15.663 5.166 |  3.03 |  3.03 |
| SobelDyAbs    | 3.456 21.488 3.370 |  6.38 |  6.38 |
| SobelDyAbsSum | 2.988 24.823 3.016 |  8.23 |  8.23 |
------------------------------------------------------

However lbp tests failed:

D:\Simd\bin\v143>Test.exe -m=a -tt=1 -fi=Lbp -ot=log.txt

Test progress: 0.0%[001] Info: Test Simd::Base::DetectionLbpDetect32fp & SimdDet
ectionLbpDetect32fp for size [1920,1080].
[001] Error: There is unhandled exception: Illegal instruction !
[001] Error: DetectionLbpDetect32fpAutoTest has errors. TEST EXECUTION IS TERMIN
ATED!
theoractice commented 1 month ago

My binary if needed (it's statically linked): Simd.zip

ermig1979 commented 3 weeks ago

Hi! Excuse me for so late answer - I was at vacation. I try to reproduce this bug but this binary works fine on my machine. Would you to get me callstack to bug localization?

theoractice commented 3 weeks ago

Thank you for your kind reply. Actually this binary also works fine on my Win10 x64 system with a modern x64 CPU. It can only fail on a x86 system. I will try to locate a callstack but that would take some time.

theoractice commented 3 weeks ago

Tested with Release/Win32 mode. After upgrading code to the latest commit, the problem still exists. The callstack:

>   Test.exe!Simd::Base::Detect<float,unsigned int>(const Simd::Detection::HidLbpCascade<float,unsigned int> & hid, unsigned int offset, int startStage) line 335   C++
    Test.exe!Simd::Sse41::Detect(const Simd::Detection::HidLbpCascade<float,unsigned int> & hid, unsigned int offset, __m128i & result) line 441    C++
    Test.exe!Simd::Sse41::DetectionLbpDetect32fp(const Simd::Detection::HidLbpCascade<float,unsigned int> & hid, const Simd::View<Simd::Allocator> & mask, const Simd::Rectangle<int> & rect, Simd::View<Simd::Allocator> & dst) line 471   C++
    Test.exe!Simd::Sse41::DetectionLbpDetect32fp(const void * _hid, const unsigned char * mask, unsigned int maskStride, int left, int top, int right, int bottom, unsigned char * dst, unsigned int dstStride) line 498    C++
    Test.exe!SimdDetectionLbpDetect32fp(const void * hid, const unsigned char * mask, unsigned int maskStride, int left, int top, int right, int bottom, unsigned char * dst, unsigned int dstStride) line 1897 C++
    [Inline Frame] Test.exe!Test::`anonymous-namespace'::FuncD::Call(const void *) line 141 C++
    Test.exe!Test::DetectionDetectAutoTest(const void * data, int width, int height, int throughColumn, int int16, const Test::`anonymous-namespace'::FuncD & f1, const Test::`anonymous-namespace'::FuncD & f2) line 200   C++
    Test.exe!Test::DetectionDetectAutoTest(const std::string & path, int throughColumn, int int16, const Test::`anonymous-namespace'::FuncD & f1, const Test::`anonymous-namespace'::FuncD & f2) line 232   C++
    [Inline Frame] Test.exe!Test::DetectionDetectAutoTest(int) line 245 C++
    Test.exe!Test::DetectionLbpDetect32fpAutoTest() line 320    C++
    [Inline Frame] Test.exe!Test::Task::RunGroup(const Test::Group &) line 574  C++
    Test.exe!Test::Task::Run() line 544 C++
    [External Code] 
    Test.exe!thread_start<unsigned int (__stdcall*)(void *),1>(void * const parameter) line 97  C++
    [External Code] 
    [The following frames may be incorrect and/or missing, no symbols loaded for kernel32.dll]  

The exception:

0x01528295: Unhandled exception at 0x01528295 in Test.exe: 0xC000001D: Illegal Instruction.

image

theoractice commented 3 weeks ago

To my surprise, Test.exe throws the same Illegal Instruction error for Debug/Win32 mode here: image This seems quite impossible, maybe the bug is specific to this Atom CPU. If you agree with me then feel free to close this issue now. I still need to test on more machines. Thank you very much for creating this library.