Ldpe2G / ArmNeonOptimization

Arm neon optimization practice
MIT License
387 stars 104 forks source link

Constant Median Filter Run time greater than 200ms #2

Open DLFCW opened 5 years ago

DLFCW commented 5 years ago

Radius 5 In 3.4Ghz Cpu i use avx2 to improve add histogram and sub histogram but the time speed greater than 200ms image size 1280*1024

Ldpe2G commented 5 years ago

@DLFCW My cpu is 3.9Ghz, the run time of constant median filter with radius 5 and image size 1280x1024 is about 140 ms. And the HISTOGRAM_LEN should be 256 not 512, https://github.com/Ldpe2G/ArmNeonOptimization/blob/master/ConstantTimeMedianFilter/src/constant_time_median_filter_uint16.h#L8 After you change it to 256, you should see a little speed up. And by the way, if your image size is too large and filter radius is small, it is not recommended to use this algorithm, because you need to allocate a large chunck of memory to store the column histograms. You can simply try to use the parallel strategy like the normal median filter dose: https://github.com/Ldpe2G/ArmNeonOptimization/blob/master/ConstantTimeMedianFilter/src/normal_median_filter_uint16.cpp#L27

DLFCW commented 5 years ago

@DLFCW My cpu is 3.9Ghz, the run time of constant median filter with radius 5 and image size 1280x1024 is about 140 ms. And the HISTOGRAM_LEN should be 256 not 512, https://github.com/Ldpe2G/ArmNeonOptimization/blob/master/ConstantTimeMedianFilter/src/constant_time_median_filter_uint16.h#L8 After you change it to 256, you should see a little speed up. And by the way, if your image size is too large and filter radius is small, it is not recommended to use this algorithm, because you need to allocate a large chunck of memory to store the column histograms. You can simply try to use the parallel strategy like the normal median filter dose: https://github.com/Ldpe2G/ArmNeonOptimization/blob/master/ConstantTimeMedianFilter/src/normal_median_filter_uint16.cpp#L27

do you know halcon? a machine vision library . The library run constant median filter only 0.9ms in same condition

Ldpe2G commented 5 years ago

No, have not heard before, the library must be done a lot optimization. I have just implemented the basic algorithm described in the paper, and there are some optimization tips described in the paper that I did not try.