ermig1979 / Simd

C++ image processing and machine learning library with using of SIMD: SSE, AVX, AVX-512, AMX for x86/x64, VMX(Altivec) and VSX(Power7) for PowerPC, NEON for ARM.
http://ermig1979.github.io/Simd
MIT License
2.04k stars 407 forks source link

GaussianBlur usage and memory issue #137

Closed s-trinh closed 3 years ago

s-trinh commented 3 years ago

The following code:

#include <iostream>

#include "Simd/SimdLib.h"

static void print(const unsigned char* img, int rows, int cols)
{
    for (int i = 0; i < rows; i++) {
        for (int j = 0; j < cols; j++) {
            std::cout << static_cast<unsigned>(img[i*cols + j]) << " ";
        }
        std::cout << std::endl;
    }
}

int main(int , char * [])
{
    const int rows = 8, cols = 12;
    unsigned char img[rows*cols];
    unsigned char img_blur[rows*cols];
    for (int i = 0; i < rows; i++) {
        for (int j = 0; j < cols; j++) {
            img[i*cols + j] = static_cast<unsigned char>(i*cols + j);
        }
    }

    const float radius = 5.0f;
    void * funcPtr = SimdGaussianBlurInit(cols, rows, 1, &radius);
    SimdGaussianBlurRun(funcPtr, img, cols, img_blur, cols);
    SimdRelease(funcPtr);

    std::cout << "Original image:" << std::endl;
    print(img, rows, cols);
    std::cout << "\nGaussian blur:" << std::endl;
    print(img_blur, rows, cols);

    return EXIT_SUCCESS;
}

produces on my computer the following error:

*** stack smashing detected ***: ./TestGaussianBlur terminated
Abandon (core dumped)

Output when running with Valgrind:

==22407== Memcheck, a memory error detector
==22407== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==22407== Using Valgrind-3.14.0 and LibVEX; rerun with -h for copyright info
==22407== Command: ./TestGaussianBlur
==22407== 
==22407== Source and destination overlap in memcpy(0x8418b60, 0x8418b60, 48)
==22407==    at 0x730E674: memcpy@@GLIBC_2.14 (vg_replace_strmem.c:1034)
==22407==    by 0x332B536: void Simd::Avx2::BlurImageAny<1>(Simd::BlurParam const&, Simd::Base::AlgDefault const&, unsigned char const*, unsigned long, unsigned char*, float*, unsigned char*, unsigned long) (SimdAvx2GaussianBlur.cpp:124)
==22407==    by 0x68AD52: Simd::Base::GaussianBlurDefault::Run(unsigned char const*, unsigned long, unsigned char*, unsigned long) (SimdBaseGaussianBlur.cpp:245)
==22407==    by 0x63D055: SimdGaussianBlurRun (SimdLib.cpp:2409)
==22407==    by 0x6386A9: main (TestGaussianBlur.cpp:78)
==22407== 
Original image:
0 1 2 3 4 5 6 7 8 9 10 11 
12 13 14 15 16 17 18 19 20 21 22 23 
24 25 26 27 28 29 30 31 32 33 34 35 
36 37 38 39 40 41 42 43 44 45 46 47 
48 49 50 51 52 53 54 55 56 57 58 59 
60 61 62 63 64 65 66 67 68 69 70 71 
72 73 74 75 76 77 78 79 80 81 82 83 
84 85 86 87 88 89 90 91 92 93 94 95 

Gaussian blur:
==22407== Use of uninitialised value of size 8
==22407==    at 0x7852BA3: ??? (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.28)
==22407==    by 0x785309F: std::ostreambuf_iterator<char, std::char_traits<char> > std::num_put<char, std::ostreambuf_iterator<char, std::char_traits<char> > >::_M_insert_int<unsigned long>(std::ostreambuf_iterator<char, std::char_traits<char> >, std::ios_base&, char, unsigned long) const (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.28)
==22407==    by 0x7860B65: std::ostream& std::ostream::_M_insert<unsigned long>(unsigned long) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.28)
==22407==    by 0x63856F: print(unsigned char const*, int, int) (TestGaussianBlur.cpp:59)
==22407==    by 0x63871F: main (TestGaussianBlur.cpp:84)
==22407== 
==22407== Conditional jump or move depends on uninitialised value(s)
==22407==    at 0x7852BB6: ??? (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.28)
==22407==    by 0x785309F: std::ostreambuf_iterator<char, std::char_traits<char> > std::num_put<char, std::ostreambuf_iterator<char, std::char_traits<char> > >::_M_insert_int<unsigned long>(std::ostreambuf_iterator<char, std::char_traits<char> >, std::ios_base&, char, unsigned long) const (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.28)
==22407==    by 0x7860B65: std::ostream& std::ostream::_M_insert<unsigned long>(unsigned long) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.28)
==22407==    by 0x63856F: print(unsigned char const*, int, int) (TestGaussianBlur.cpp:59)
==22407==    by 0x63871F: main (TestGaussianBlur.cpp:84)
==22407== 
16 17 18 18 19 20 21 22 23 23 24 24 
16 17 18 18 19 20 21 22 23 23 24 24 
21 22 23 23 24 25 26 27 27 28 29 29 
26 27 27 28 29 30 31 31 32 33 34 34 
31 31 32 32 33 34 35 36 36 37 38 38 
34 34 35 35 36 37 38 38 39 40 40 41 
35 35 36 36 37 38 38 39 40 40 41 41 
34 34 35 35 36 36 37 38 38 39 39 39 
==22407== Conditional jump or move depends on uninitialised value(s)
==22407==    at 0x638732: main (TestGaussianBlur.cpp:87)
==22407== 
*** stack smashing detected ***: ./TestGaussianBlur terminated
==22407== 
==22407== Process terminating with default action of signal 6 (SIGABRT)
==22407==    at 0x806D438: raise (raise.c:54)
==22407==    by 0x806F039: abort (abort.c:89)
==22407==    by 0x80AF7F9: __libc_message (libc_fatal.c:175)
==22407==    by 0x815121B: __fortify_fail (fortify_fail.c:37)
==22407==    by 0x81511BF: __stack_chk_fail (stack_chk_fail.c:28)
==22407==    by 0x638738: main (TestGaussianBlur.cpp:87)
==22407== 
==22407== HEAP SUMMARY:
==22407==     in use at exit: 8,320 bytes in 8 blocks
==22407==   total heap usage: 18 allocs, 10 frees, 93,336 bytes allocated
==22407== 
==22407== LEAK SUMMARY:
==22407==    definitely lost: 0 bytes in 0 blocks
==22407==    indirectly lost: 0 bytes in 0 blocks
==22407==      possibly lost: 0 bytes in 0 blocks
==22407==    still reachable: 8,320 bytes in 8 blocks
==22407==         suppressed: 0 bytes in 0 blocks
==22407== Rerun with --leak-check=full to see details of leaked memory
==22407== 
==22407== For counts of detected and suppressed errors, rerun with: -v
==22407== Use --track-origins=yes to see where uninitialised values come from
==22407== ERROR SUMMARY: 386 errors from 4 contexts (suppressed: 0 from 0)
Abandon (core dumped)

What could be the issue in my code?


In the SimdGaussianBlurInit() function, what is the relationship between radius and standard sigma value for Gaussian kernel?

For instance with scipy.ndimage.gaussian_filter, it gives:

img:
 [[ 0  1  2  3  4  5  6  7  8  9 10 11]
 [12 13 14 15 16 17 18 19 20 21 22 23]
 [24 25 26 27 28 29 30 31 32 33 34 35]
 [36 37 38 39 40 41 42 43 44 45 46 47]
 [48 49 50 51 52 53 54 55 56 57 58 59]
 [60 61 62 63 64 65 66 67 68 69 70 71]
 [72 73 74 75 76 77 78 79 80 81 82 83]
 [84 85 86 87 88 89 90 91 92 93 94 95]]
img_blur (sigma=5.0):
 [[39 39 39 40 40 41 41 42 42 43 43 43]
 [40 40 40 41 41 42 42 43 43 44 44 44]
 [41 41 41 42 42 43 43 44 44 45 45 45]
 [43 43 43 44 44 45 45 46 46 47 47 47]
 [46 46 46 47 47 48 48 49 49 50 50 50]
 [48 48 48 49 49 50 50 51 51 52 52 52]
 [49 49 49 50 50 51 51 52 52 53 53 53]
 [50 50 50 51 51 52 52 53 53 54 54 54]]
img_blur (sigma=5.0/2.0):
 [[19 19 20 21 22 23 23 24 25 26 27 27]
 [23 23 24 25 26 27 27 28 29 30 31 31]
 [29 29 30 31 32 33 33 34 35 36 37 37]
 [38 38 39 40 41 42 42 43 44 45 46 46]
 [47 47 48 49 50 51 51 52 53 54 55 55]
 [56 56 57 58 59 60 60 61 62 63 64 64]
 [62 62 63 64 65 66 66 67 68 69 70 70]
 [66 66 67 68 69 70 70 71 72 73 74 74]]

Finally, why the radius parameter in SimdGaussianBlurInit() is a const float pointer? Looks like there is no need to have a pointer for this, unless to support different radius values for X and Y axes?

ermig1979 commented 3 years ago

1) I found a bug (case of small image size). Thank you for bug report. 2) In Gaussian blur engine I use formula: exp(-x*x/(r*r)). Possible this is influence of previous bug for small images. 3) There is only one radius of Gaussian blur. Of course I can add two parameters (radiusX and radiusY) if it will be need. The using of const pointer to float point value instead of direct transferring of this parameter is connected with internal feature of Simd Library. There is a dispatcher whish check existence of SIMD extension in runtime in call the best avalable implementation for current algorithm. If it transfers parameter by value it can use vector SIMD register even if they are not available and it leads to crash.

s-trinh commented 3 years ago

I am still having some issue:

#include <iostream>

#include "Simd/SimdLib.h"
#include "Simd/SimdView.hpp"

static void print(const unsigned char* img, int rows, int cols)
{
    for (int i = 0; i < rows; i++) {
        for (int j = 0; j < cols; j++) {
            std::cout << static_cast<unsigned>(img[i*cols + j]) << " ";
        }
        std::cout << std::endl;
    }
}

int main(int , char * [])
{
    const int rows = 8, cols = 12;
    unsigned char img[rows*cols];
    unsigned char img_blur[rows*cols];
    for (int i = 0; i < rows; i++) {
        for (int j = 0; j < cols; j++) {
            img[i*cols + j] = static_cast<unsigned char>(i*cols + j);
        }
    }

    const float radius = 5.0f;
    void * funcPtr = SimdGaussianBlurInit(cols, rows, 1, &radius);
#if 1
    SimdGaussianBlurRun(funcPtr, img, cols, img_blur, cols);
#else
    typedef Simd::View<Simd::Allocator> View;
    View view_img(cols, rows, cols, View::Gray8, img);
    View view_img_blur(cols, rows, cols, View::Gray8, img_blur);

    SimdGaussianBlurRun(funcPtr, view_img.data, view_img.stride, view_img_blur.data, view_img_blur.stride);
#endif
    SimdRelease(funcPtr);

    std::cout << "Original image:" << std::endl;
    print(img, rows, cols);
    std::cout << "\nGaussian blur:" << std::endl;
    print(img_blur, rows, cols);

    return EXIT_SUCCESS;
}

This gives (using raw array or using View):

valgrind ./TestGaussianBlur --leak-check=full -v
==18975== Memcheck, a memory error detector
==18975== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==18975== Using Valgrind-3.14.0 and LibVEX; rerun with -h for copyright info
==18975== Command: ./TestGaussianBlur --leak-check=full -v
==18975== 
==18975== Warning: set address range perms: large range [0x400000, 0x10e04000) (defined)
Original image:
0 1 2 3 4 5 6 7 8 9 10 11 
12 13 14 15 16 17 18 19 20 21 22 23 
24 25 26 27 28 29 30 31 32 33 34 35 
36 37 38 39 40 41 42 43 44 45 46 47 
48 49 50 51 52 53 54 55 56 57 58 59 
60 61 62 63 64 65 66 67 68 69 70 71 
72 73 74 75 76 77 78 79 80 81 82 83 
84 85 86 87 88 89 90 91 92 93 94 95 

Gaussian blur:
18 18 19 20 21 22 22 23 24 25 25 26 
18 18 19 20 21 22 22 23 24 25 25 26 
24 25 25 26 27 28 29 30 30 31 32 32 
31 32 33 33 34 35 36 37 38 38 39 40 
39 40 41 41 42 43 44 45 46 46 47 48 
47 48 49 49 50 51 52 53 54 54 55 56 
55 56 57 57 58 59 60 61 62 62 63 64 
63 63 64 65 65 66 67 68 69 70 70 71 
*** stack smashing detected ***: ./TestGaussianBlur terminated
==18975== 
==18975== Process terminating with default action of signal 6 (SIGABRT)
==18975==    at 0x1299C438: raise (raise.c:54)
==18975==    by 0x1299E039: abort (abort.c:89)
==18975==    by 0x129DE7F9: __libc_message (libc_fatal.c:175)
==18975==    by 0x12A8021B: __fortify_fail (fortify_fail.c:37)
==18975==    by 0x12A801BF: __stack_chk_fail (stack_chk_fail.c:28)
==18975==    by 0x7B8FF8: main (TestGaussianBlur.cpp:46)
==18975== 
==18975== HEAP SUMMARY:
==18975==     in use at exit: 11,440 bytes in 11 blocks
==18975==   total heap usage: 21 allocs, 10 frees, 96,456 bytes allocated
==18975== 
==18975== LEAK SUMMARY:
==18975==    definitely lost: 0 bytes in 0 blocks
==18975==    indirectly lost: 0 bytes in 0 blocks
==18975==      possibly lost: 0 bytes in 0 blocks
==18975==    still reachable: 11,440 bytes in 11 blocks
==18975==         suppressed: 0 bytes in 0 blocks
==18975== Rerun with --leak-check=full to see details of leaked memory
==18975== 
==18975== For counts of detected and suppressed errors, rerun with: -v
==18975== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
Abandon (core dumped)

I don't think I am doing something wrong?

ermig1979 commented 3 years ago

No. You did alright. I found another error. And fixed it.

s-trinh commented 3 years ago

Thanks for the fixes :+1:

For future references, the corresponding commits are 47828de and d44c976


I have managed to have similar results between Simd, Python and Matlab:

Original image:
0 1 2 3 4 5 6 7 8 9 10 11 
12 13 14 15 16 17 18 19 20 21 22 23 
24 25 26 27 28 29 30 31 32 33 34 35 
36 37 38 39 40 41 42 43 44 45 46 47 
48 49 50 51 52 53 54 55 56 57 58 59 
60 61 62 63 64 65 66 67 68 69 70 71 
72 73 74 75 76 77 78 79 80 81 82 83 
84 85 86 87 88 89 90 91 92 93 94 95 

Gaussian blur:
18 18 19 20 21 22 22 23 24 25 25 26 
24 25 25 26 27 28 29 30 30 31 32 32 
31 32 33 33 34 35 36 37 38 38 39 40 
39 40 41 41 42 43 44 45 46 46 47 48 
47 48 49 49 50 51 52 53 54 54 55 56 
55 56 57 57 58 59 60 61 62 62 63 64 
63 63 64 65 65 66 67 68 69 70 70 71 
69 70 70 71 72 73 73 74 75 76 77 77 
from scipy.ndimage import gaussian_filter
import numpy as np

rows = 8
cols = 12
img = np.arange(rows*cols).reshape((rows,cols))
print("img:\n", img)

sigma = np.sqrt(25.0/2)
img_blur = gaussian_filter(img, sigma=sigma, mode = 'nearest')
print("img_blur (sigma=", sigma, "):\n", img_blur)
img:
 [[ 0  1  2  3  4  5  6  7  8  9 10 11]
 [12 13 14 15 16 17 18 19 20 21 22 23]
 [24 25 26 27 28 29 30 31 32 33 34 35]
 [36 37 38 39 40 41 42 43 44 45 46 47]
 [48 49 50 51 52 53 54 55 56 57 58 59]
 [60 61 62 63 64 65 66 67 68 69 70 71]
 [72 73 74 75 76 77 78 79 80 81 82 83]
 [84 85 86 87 88 89 90 91 92 93 94 95]]
img_blur (sigma= 3.5355339059327378 ):
 [[17 17 18 19 20 21 21 22 23 24 25 25]
 [23 23 24 25 26 27 27 28 29 30 31 31]
 [31 31 32 33 34 35 35 36 37 38 39 39]
 [38 38 39 40 41 42 42 43 44 45 46 46]
 [47 47 48 49 50 51 51 52 53 54 55 55]
 [54 54 55 56 57 58 58 59 60 61 62 62]
 [62 62 63 64 65 66 66 67 68 69 70 70]
 [68 68 69 70 71 72 72 73 74 75 76 76]]
img = reshape(0:95, [12,8])'
int8(imgaussfilt(img, sqrt(25/2)))
img =

     0     1     2     3     4     5     6     7     8     9    10    11
    12    13    14    15    16    17    18    19    20    21    22    23
    24    25    26    27    28    29    30    31    32    33    34    35
    36    37    38    39    40    41    42    43    44    45    46    47
    48    49    50    51    52    53    54    55    56    57    58    59
    60    61    62    63    64    65    66    67    68    69    70    71
    72    73    74    75    76    77    78    79    80    81    82    83
    84    85    86    87    88    89    90    91    92    93    94    95

ans =

  8×12 int8 matrix

   17   18   19   19   20   21   22   23   24   24   25   26
   24   24   25   26   27   27   28   29   30   31   32   32
   31   32   32   33   34   35   36   37   37   38   39   39
   39   40   40   41   42   43   44   45   46   46   47   48
   47   48   49   49   50   51   52   53   54   55   55   56
   56   56   57   58   58   59   60   61   62   63   63   64
   63   63   64   65   66   67   68   68   69   70   71   71
   69   70   71   71   72   73   74   75   76   76   77   78

@ermig1979

It would be great if the SimdGaussianBlur documentation could be updated with the formula that is used (exp(-x*x/(r*r))) and how the image border is handled (e.g. constant mode, nearest mode, ...).

ermig1979 commented 3 years ago

I changed blur coefficients according to standard, added epsilon (relative error) parameter, added algorithm description to documentation.

s-trinh commented 3 years ago

Thanks for the update.


With sigma = 0.5f, Simd gives:

I_small:
  0   1   2   3   4   5   6   7   8   9  10  11
 12  13  14  15  16  17  18  19  20  21  22  23
 24  25  26  27  28  29  30  31  32  33  34  35
 36  37  38  39  40  41  42  43  44  45  46  47
 48  49  50  51  52  53  54  55  56  57  58  59
 60  61  62  63  64  65  66  67  68  69  70  71
 72  73  74  75  76  77  78  79  80  81  82  83
 84  85  86  87  88  89  90  91  92  93  94  95

I_small_blur:
  1   2   3   4   5   6   7   8   9  10  11  12
 12  13  14  15  16  17  18  19  20  21  22  23
 24  25  26  27  28  29  30  31  32  33  34  35
 36  37  38  39  40  41  42  43  44  45  46  47
 48  49  50  51  52  53  54  55  56  57  58  59
 60  61  62  63  64  65  66  67  68  69  70  71
 71  72  73  74  75  76  77  78  79  80  81  82
 72  73  74  75  76  77  78  79  80  81  82  83

With Matlab:

img = reshape(0:95, [12,8])'

img =

     0     1     2     3     4     5     6     7     8     9    10    11
    12    13    14    15    16    17    18    19    20    21    22    23
    24    25    26    27    28    29    30    31    32    33    34    35
    36    37    38    39    40    41    42    43    44    45    46    47
    48    49    50    51    52    53    54    55    56    57    58    59
    60    61    62    63    64    65    66    67    68    69    70    71
    72    73    74    75    76    77    78    79    80    81    82    83
    84    85    86    87    88    89    90    91    92    93    94    95

imgaussfilt(img, 0.5)

ans =

    1.3846    2.2781    3.2781    4.2781    5.2781    6.2781    7.2781    8.2781    9.2781   10.2781   11.2781   12.1716
   12.1065   13.0000   14.0000   15.0000   16.0000   17.0000   18.0000   19.0000   20.0000   21.0000   22.0000   22.8935
   24.1065   25.0000   26.0000   27.0000   28.0000   29.0000   30.0000   31.0000   32.0000   33.0000   34.0000   34.8935
   36.1065   37.0000   38.0000   39.0000   40.0000   41.0000   42.0000   43.0000   44.0000   45.0000   46.0000   46.8935
   48.1065   49.0000   50.0000   51.0000   52.0000   53.0000   54.0000   55.0000   56.0000   57.0000   58.0000   58.8935
   60.1065   61.0000   62.0000   63.0000   64.0000   65.0000   66.0000   67.0000   68.0000   69.0000   70.0000   70.8935
   72.1065   73.0000   74.0000   75.0000   76.0000   77.0000   78.0000   79.0000   80.0000   81.0000   82.0000   82.8935
   82.8284   83.7219   84.7219   85.7219   86.7219   87.7219   88.7219   89.7219   90.7219   91.7219   92.7219   93.6154

The last row with Simd looks strange. Maybe it is due to floating point accuracy?

Python gives the same result than Matlab:

from scipy.ndimage import gaussian_filter
import numpy as np

rows = 8
cols = 12
img = np.arange(rows*cols).reshape((rows,cols))
print("img:\n", img)

sigma = 0.5
img_blur = gaussian_filter(img, sigma=sigma, mode = 'nearest')
print("img_blur (nearest sigma=", sigma, "):\n", img_blur)
img:
 [[ 0  1  2  3  4  5  6  7  8  9 10 11]
 [12 13 14 15 16 17 18 19 20 21 22 23]
 [24 25 26 27 28 29 30 31 32 33 34 35]
 [36 37 38 39 40 41 42 43 44 45 46 47]
 [48 49 50 51 52 53 54 55 56 57 58 59]
 [60 61 62 63 64 65 66 67 68 69 70 71]
 [72 73 74 75 76 77 78 79 80 81 82 83]
 [84 85 86 87 88 89 90 91 92 93 94 95]]
img_blur (nearest sigma= 0.5 ):
 [[ 1  2  2  4  5  5  6  8  9 10 10 11]
 [12 13 13 15 16 17 18 19 20 21 21 22]
 [23 24 25 26 27 28 29 31 32 33 33 34]
 [36 37 38 38 39 40 41 42 43 44 45 46]
 [47 48 49 50 51 52 53 54 55 56 57 58]
 [60 61 62 63 64 65 66 67 68 69 69 70]
 [71 72 72 74 75 76 77 78 79 80 80 81]
 [82 83 84 85 86 87 88 88 89 90 91 92]]
ermig1979 commented 3 years ago

It seems to be an error: last row uses incorrect data.

ermig1979 commented 3 years ago

The error was found and fixed. Thank you for bug report!

s-trinh commented 3 years ago

Thanks again for the fix.

Just wondering if it is possible with this GaussianBlur implementation to parallelize the call over the image rows?

ermig1979 commented 3 years ago

Yes, I this algorithm can be parallelized after some modification (you have to separate source image into independent parts by rows). But I think that doesn't get significant gain because the algorithm is bound by memory throughput.