ermig1979 / Simd

C++ image processing and machine learning library with using of SIMD: SSE, AVX, AVX-512, AMX for x86/x64, NEON for ARM.
http://ermig1979.github.io/Simd
MIT License
2.06k stars 412 forks source link

Issue in SynetConvolution32fNhwcDirect::OldReorderWeight #145

Closed teor292 closed 3 years ago

teor292 commented 3 years ago

Hi. I use Simd with Synet to test perfomance of some neural networks. It works fine on Windows (x64) and Linux (x64). But on ARMv7 compiled with gcc 6.3.0 it freezes.

I don't know the reason, but here what I found.

Here is start piece of code of SynetConvolution32fNhwcDirect::OldReorderWeight:

const ConvParam32f& p = _param; 
const AlgParam& a = _old.alg;
for (size_t da = 0; da < p.dstC; da += a.macroD)

Parameters of AlgParam on x64 are as follows:

F  8
microD  16
macroH  10
macroC  3
macroD  16

Others contain some trash as I think.

But on ARMv7 this values are as follows:

F 4
microD  8
macroH  10
macroC  3
macroD  0 (!)

So for (size_t da = 0; da < p.dstC; da += a.macroD) go to infinite loop (p.dstC == 10). I am hope for your help.

ermig1979 commented 3 years ago

It signilizes to wrong algorithm of L3 cache size getting on ARMv7. Unfortunately I can't get access to this platform now to reproduce bug. I can try to fix this issue but can't check it.

teor292 commented 3 years ago

Ok, but I can check it :) If you need some kind of debugging information, I can try to provide it.

ermig1979 commented 3 years ago

Could you check bug fix? SimdBaseSynetConvolution32f.zip

teor292 commented 3 years ago

Well, now the neural network is loading, but freezes during calculation. I'll try to find a moment.

teor292 commented 3 years ago

Ok, I found it.

In file SimdGemm.h:

at line 371:

        void Run(size_t M, const T * A, size_t lda, const T * pB, T * C, size_t ldc)
        {
            assert(M <= _M);
            for (size_t j = 0; j < _N; j += _macroN)

_macroN == 0

Here is callstack:

1  Simd::GemmNNcb<float, 4u, unsigned int>::Run            SimdGemm.h                      376  0x421db0 
2  Simd::GemmNNcb<float, 4u, unsigned int>::Run            SimdGemm.h                      368  0x3bde88 
3  Simd::Neon::Gemm32fNNcbRun                              SimdNeonGemm32f.cpp             2433 0x3bde88 
4  Simd::GemmCbFunc::Run                                   SimdRuntime.h                   280  0x2c7b38 
5  Simd::Runtime<Simd::GemmCbFunc, Simd::GemmCbArgs>::Test SimdRuntime.h                   156  0x2c7b38 
6  Simd::Runtime<Simd::GemmCbFunc, Simd::GemmCbArgs>::Run  SimdRuntime.h                   90   0x2c7b38 
7  Simd::Base::SynetConvolution32fGemmNN::Forward          SimdBaseSynetConvolution32f.cpp 367  0x2c7b38 
8  SimdSynetConvolution32fForward                          SimdLib.cpp                     5446 0x23a1f0 
9  Synet::Convolution32f::Forward                          Convolution.h                   109  0x16cb94 
10 Synet::Convolution32fLayer<float>::ForwardCpu           Convolution32fLayer.h           85   0x16cb94 
11 Synet::Convolution32fLayer<float>::ForwardCpu           Convolution32fLayer.h           79   0x1178c4 
12 Synet::Layer<float>::Forward                            Layer.h                         146  0x955f0  
13 Synet::Network<float>::Forward                          Network.h                       356  0x8c5f4  
14 SynetTester::run_once_                                  SynetTester.cpp                 43   0x3db80  
15 BaseTester::Run                                         BaseTester.cpp                  27   0x1bf58  
16 main                                                    main.cpp                        14   0x1c3794 
ermig1979 commented 3 years ago

The second iteration: SimdGemm.zip

teor292 commented 3 years ago

Everything works, thank you!

ermig1979 commented 3 years ago

It's good news! I commited changes.