JuliaImages / ImageFiltering.jl

Julia implementations of multidimensional array convolution and nonlinear stencil operations
Other
99 stars 49 forks source link

imfilter! is slower with an Array source than an OffsetArray #95

Open mbauman opened 5 years ago

mbauman commented 5 years ago

Could imfilter use the secret sauce from the specialization for OffsetArray in more cases? I've not wrapped my head fully around all the indexing permutations, but here's a simple benchmark test case:

This hits the fast OffsetArray implementation:

img = rand(0:1, 1024, 1024)
const kern = ImageFiltering.factorkernel(centered([10 2 10; 2 1 2; 10 2 10]))
src = ImageFiltering.padarray(Int, img, Pad(:reflect, 1,1))
dest = similar(img)
@benchmark imfilter!($dest, $src, kern, ImageFiltering.NoPad())
BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     11.882 ms (0.00% GC)
  median time:      11.904 ms (0.00% GC)
  mean time:        11.910 ms (0.00% GC)
  maximum time:     13.976 ms (0.00% GC)
  --------------
  samples:          420
  evals/sample:     1

Theoretically, I think this could be faster as it does fewer computations than the one above (it skips the outside edge, yes?), but it falls back to the generic implementation:

src = img
dest = OffsetArray(similar(img, size(img).-2), (1, 1))
@benchmark imfilter!($dest, $src, kern, ImageFiltering.NoPad())
BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     19.408 ms (0.00% GC)
  median time:      19.470 ms (0.00% GC)
  mean time:        19.501 ms (0.00% GC)
  maximum time:     21.914 ms (0.00% GC)
  --------------
  samples:          257
  evals/sample:     1

This restores a bit of performance and hits the specialized method, but it's still a bit slower:

src = OffsetArray(img, (0,0))
dest = OffsetArray(similar(img, size(img).-2), (1, 1))
@benchmark imfilter!($dest, $src, kern, ImageFiltering.NoPad())
BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     13.947 ms (0.00% GC)
  median time:      13.965 ms (0.00% GC)
  mean time:        13.985 ms (0.00% GC)
  maximum time:     18.823 ms (0.00% GC)
  --------------
  samples:          358
  evals/sample:     1