Open johnnychen94 opened 4 years ago
Yes, now that wrappers don't allocate we should revisit the strategies here.
For 2d windows at least, mapwindow
can be an order of magnitude faster by instead of a view
into the main array, moving a block of StaticArray
s along the rows and updating it for each column using a @generated
function that reads only one column at a time. The block can be of 2 or 4 etc StaticArray
windows so we read an even number each time. You can read less than one cache line per window this way, and use one thread per block of rows.
I wrote this for DynamicGrids.jl, but it would be good to abstract this out into some kind of generalised stencil package some day so these algs can be more widely used: https://github.com/cesaraustralia/DynamicGrids.jl/blob/master/src/maprules.jl#L348-L370
I noticed performance seemed to degrade a lot(let's say it didn't remain realtime) as I increased the filter size to something like (13,13) ( on a image of around (500,500) but then was able to find let's say a more suitable way for problem by first using mapwindow with filter (3,3) that worked. It's an excellent tool though, to remove noise and small blobs. would be great to have even slightly better version of it. ps: was using for realtime hand gesture recognition, worked great with our julian tools..though still looking for optimizations : )
I've hit the performance degradation on larger window sizes of the above code as well. I've dabbled with @rafaqz approach, but found it hard to get right/performant. I would certainly be interested in a generic version.
For now, over at GeoArrayOps, I've found some nice performance for large window sizes because my maximum/minimum filters are separable (2d = multiple 1d operations) or by repeating a smaller windows (for diamond shaped windows).
Im refactoring the neighborhoods in DynamicGrids.jl to be a Neighborhoods.jl package for similar purposes to yours (e.g. slope filters).
It should be a lot faster than mapwindow
here, I would hope at least 10x. For really large window sizes it will slow down too, at some point you need an FFT.
Didn't explore it in depth, but the following hand-written version in 5mins is faster than what
mapwindow
provides, so I believe there are still room for performance tweak: