JuliaImages / ImageFiltering.jl

Julia implementations of multidimensional array convolution and nonlinear stencil operations
Other
99 stars 51 forks source link

WIP: enable reusing fft plans #271

Open IanButterworth opened 5 months ago

IanButterworth commented 5 months ago

Requires

TODO:


Because they are not threadsafe FFTW FFT plans are behind a lock. They should be reusable for same-sized images and kernels, so this is WIP towards exposing a way to provide plans, so that imfilter! can be run concurrently on julia threads without lock conflict.

Note that this is using https://github.com/JuliaLang/julia/pull/52883 to count conflicts.

There's a demo temporarily in this PR.

Before it hits 34255 lock conflicts

% FFTW_NUM_THREADS=1 BLAS_NUM_THREADS=3 ../julia/julia --project -t6,1 -q
julia> include("demo.jl")
warm run of benchmark(mats): 5.483911 seconds (2.14 M allocations: 10.671 GiB, 16.62% gc time, 34255 lock conflicts)

This PR removes all but the lock conflicts from the singlular plans, which could be moved out of the for loop

% FFTW_NUM_THREADS=1 BLAS_NUM_THREADS=3 ../julia/julia --project -t6,1 -q
julia> include("demo.jl")
warm run of benchmark(mats): 2.440659 seconds (478.12 k allocations: 7.888 GiB, 21.53% gc time, 8 lock conflicts)
IanButterworth commented 5 months ago

Seeing as this seems to work, what would be a good design to pass this through the API?

Given that plans are an FFT peculiarity I wasn't sure if there was an appropriate way to pass it through the more generic imfilter!methods.

IanButterworth commented 5 months ago

Maybe this would make sense?

r = CPU1(Algorithm.FFT(img, kern))
imfilter!(r, buffer, img, kern)

So make a constructor that generates plans if args are given and stores them? https://github.com/JuliaImages/ImageFiltering.jl/blob/08e34f2b7025c51b0ba9aab9249fac11d3897fe5/src/ImageFiltering.jl#L56

IanButterworth commented 5 months ago

Ok, I think this seems reasonable now. @timholy would you mind reviewing for general design and I can add tests & docs if you agree

timholy commented 5 months ago

I confess I haven't thought about these plans in ages (the code I linked earlier was probably written in the Julia 0.1 days), but nothing bad jumps out about this. Thanks for doing it!!

IanButterworth commented 5 months ago

I'm a little stuck.

Currently running demo.jl gives

julia> include("demo.jl")
ERROR: LoadError: BoundsError: attempt to access 49×96 Matrix{ComplexF32} at index [CartesianIndex(81, 87)]
Stacktrace:
  [1] throw_boundserror(A::Matrix{ComplexF32}, I::Tuple{CartesianIndex{2}})
    @ Base ./abstractarray.jl:734
  [2] checkbounds
    @ ./abstractarray.jl:699 [inlined]
  [3] macro expansion
    @ ./multidimensional.jl:1128 [inlined]
  [4] copyto!(dest::Matrix{Float32}, Rdest::CartesianIndices{2, Tuple{…}}, src::Matrix{ComplexF32}, Rsrc::CartesianIndices{2, Tuple{…}})
    @ Base ./multidimensional.jl:1119
  [5] _imfilter_fft!(r::CPU1{…}, out::Matrix{…}, A::OffsetArrays.OffsetMatrix{…}, kernel::Tuple{…}, border::NoPad{…})
    @ ImageFiltering ~/Documents/GitHub/ImageFiltering.jl/src/imfilter.jl:0
  [6] imfilter!
    @ ImageFiltering ~/Documents/GitHub/ImageFiltering.jl/src/imfilter.jl:816 [inlined]
  [7] imfilter!
    @ ImageFiltering ~/Documents/GitHub/ImageFiltering.jl/src/imfilter.jl:340 [inlined]
  [8] imfilter!(r::CPU1{…}, out::Matrix{…}, img::SubArray{…}, kernel::Tuple{…}, border::Pad{…})
    @ ImageFiltering ~/Documents/GitHub/ImageFiltering.jl/src/imfilter.jl:327
  [9] imfilter!(r::AbstractResource, out::AbstractArray{…}, img::AbstractArray{…}, kernel::Tuple, border::Union{…}) where {…}
    @ ImageFiltering ~/Documents/GitHub/ImageFiltering.jl/src/imfilter.jl:217 [inlined]
 [10] test(mats::Vector{Array{Float32, 3}})
    @ Main ~/Documents/GitHub/ImageFiltering.jl/demo.jl:31

For the existing approach, irfft call takes size(B) = (49, 96) and returns a (96, 96) matrix

https://github.com/JuliaImages/ImageFiltering.jl/blob/a6a3cce826b891d979cee3668fb7177f52683e45/src/imfilter.jl#L884

For the new approach with buffered_planned_irfft RFFT.RCpair{T}(undef, size(a)) takes size(a) = (96, 96) and creates a buffer with (49, 96) which means the output of its lambda is a (49, 96) array, where the remainder of _imfilter_fft! is expecting a (96, 96) matrix. https://github.com/JuliaImages/ImageFiltering.jl/blob/a6a3cce826b891d979cee3668fb7177f52683e45/src/imfilter.jl#L851-L858

If this is clear to anyone, guidance would be appreciated. Otherwise I'll try to change RFFT.RCpair to accomodate, but I don't understand that constructor.

https://github.com/JuliaImages/ImageFiltering.jl/blob/a6a3cce826b891d979cee3668fb7177f52683e45/src/RFFT.jl#L15-L24

IanButterworth commented 5 months ago

Comparing buffered vs standard I get different values here https://github.com/JuliaImages/ImageFiltering.jl/blob/78044e7f3b85cde221fe41cd02f6a97e1af135fc/src/imfilter.jl#L875-L877

B[1:4] =  ComplexF32[3065.7727f0 + 113.279686f0im, -254.48924f0 - 85.513504f0im, -369.77383f0 - 146.0826f0im, -267.98865f0 - 201.43063f0im]
_B[1:4] = ComplexF32[4595.6045f0 + 0.0f0im,         3.6047888f0 + 15.441383f0im,  14.378552f0 - 19.552792f0im, -18.684448f0 - 19.284311f0im]

My trial and error phase of debugging isn't being very fruitful, so I think I'll wait for review. @timholy if you happen to have time

mkitti commented 5 months ago

Commeting to remind myself to take a look

IanButterworth commented 4 months ago

(I mentioned to Tim on slack already in response to the question above that this is all non-threaded)

I'm stuck.

So this sanity check evaluates true

using RFFT, FFTW
a = rand(Float64, 10, 10)
out1 = rfft(a)

buf = RFFT.RCpair{Float64}(undef, size(a))
rfft_plan = RFFT.plan_rfft!(buf)
copy!(buf, a)
out2 = complex(rfft_plan(buf))

out1 ≈ out2

however, for equal krn inputs (validated through typeof and sum equality) these lines give different results

The existing uncached plan approach

https://github.com/JuliaImages/ImageFiltering.jl/blob/a79e8078d3adc4481db3b66ce5990b390bddecc7/src/imfilter.jl#L915

┌ Info: good one
│   (rfft(krn))[1:3] =
│    3-element Vector{ComplexF64}:
│     -0.00014399948754237335 + 0.0im
│       -0.004414546165525474 + 1.4913814981424873e-17im
└       -0.017117085642739213 + 8.853611881175574e-18im

Cached plan approach

https://github.com/JuliaImages/ImageFiltering.jl/blob/a79e8078d3adc4481db3b66ce5990b390bddecc7/src/imfilter.jl#L893

┌ Info: bad one 1
│   (complex(planned_rfft2(AbstractFFTs.realfloat(krn))))[1:3] =
│    3-element Vector{ComplexF64}:
│     -0.00014399948754237335 + 0.0im
│       -0.004405094283318256 - 0.0002887251333562888im
└       -0.016970646602189325 - 0.0022342280108512223im

where planned_rfft1 is generated from https://github.com/JuliaImages/ImageFiltering.jl/blob/a79e8078d3adc4481db3b66ce5990b390bddecc7/src/imfilter.jl#L845-L852

codecov[bot] commented 3 months ago

Codecov Report

Attention: Patch coverage is 15.00000% with 34 lines in your changes are missing coverage. Please review.

Project coverage is 89.78%. Comparing base (66cf9d9) to head (6ef477e).

:exclamation: Current head 6ef477e differs from pull request most recent head 9db8caf. Consider uploading reports for the commit 9db8caf to get more accurate results

Files Patch % Lines
src/imfilter.jl 10.52% 34 Missing :warning:
Additional details and impacted files ```diff @@ Coverage Diff @@ ## master #271 +/- ## ========================================== - Coverage 94.25% 89.78% -4.48% ========================================== Files 13 13 Lines 1706 1742 +36 ========================================== - Hits 1608 1564 -44 - Misses 98 178 +80 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

IanButterworth commented 3 months ago

Ok. Tests are passing here.

Note that planned_fft doesn't support non-float eltypes currently because RFFT.RCpair only supports floats (and other reasons, I guess).

However, demo.jl still fails

f1[1:4] = [0.11624783622854669, 0.15113825707056688, 0.16887924282352731, 0.20249257393260642]
f2[1:4] = [0.17744583830964283, -0.0655415564571991, -0.10802786527654895, 0.11185075216343188]
ERROR: LoadError: f1 !≈ f2