Open IanButterworth opened 5 months ago
Seeing as this seems to work, what would be a good design to pass this through the API?
Given that plans are an FFT peculiarity I wasn't sure if there was an appropriate way to pass it through the more generic imfilter!
methods.
Maybe this would make sense?
r = CPU1(Algorithm.FFT(img, kern))
imfilter!(r, buffer, img, kern)
So make a constructor that generates plans if args are given and stores them? https://github.com/JuliaImages/ImageFiltering.jl/blob/08e34f2b7025c51b0ba9aab9249fac11d3897fe5/src/ImageFiltering.jl#L56
Ok, I think this seems reasonable now. @timholy would you mind reviewing for general design and I can add tests & docs if you agree
I confess I haven't thought about these plans in ages (the code I linked earlier was probably written in the Julia 0.1 days), but nothing bad jumps out about this. Thanks for doing it!!
I'm a little stuck.
Currently running demo.jl
gives
julia> include("demo.jl")
ERROR: LoadError: BoundsError: attempt to access 49×96 Matrix{ComplexF32} at index [CartesianIndex(81, 87)]
Stacktrace:
[1] throw_boundserror(A::Matrix{ComplexF32}, I::Tuple{CartesianIndex{2}})
@ Base ./abstractarray.jl:734
[2] checkbounds
@ ./abstractarray.jl:699 [inlined]
[3] macro expansion
@ ./multidimensional.jl:1128 [inlined]
[4] copyto!(dest::Matrix{Float32}, Rdest::CartesianIndices{2, Tuple{…}}, src::Matrix{ComplexF32}, Rsrc::CartesianIndices{2, Tuple{…}})
@ Base ./multidimensional.jl:1119
[5] _imfilter_fft!(r::CPU1{…}, out::Matrix{…}, A::OffsetArrays.OffsetMatrix{…}, kernel::Tuple{…}, border::NoPad{…})
@ ImageFiltering ~/Documents/GitHub/ImageFiltering.jl/src/imfilter.jl:0
[6] imfilter!
@ ImageFiltering ~/Documents/GitHub/ImageFiltering.jl/src/imfilter.jl:816 [inlined]
[7] imfilter!
@ ImageFiltering ~/Documents/GitHub/ImageFiltering.jl/src/imfilter.jl:340 [inlined]
[8] imfilter!(r::CPU1{…}, out::Matrix{…}, img::SubArray{…}, kernel::Tuple{…}, border::Pad{…})
@ ImageFiltering ~/Documents/GitHub/ImageFiltering.jl/src/imfilter.jl:327
[9] imfilter!(r::AbstractResource, out::AbstractArray{…}, img::AbstractArray{…}, kernel::Tuple, border::Union{…}) where {…}
@ ImageFiltering ~/Documents/GitHub/ImageFiltering.jl/src/imfilter.jl:217 [inlined]
[10] test(mats::Vector{Array{Float32, 3}})
@ Main ~/Documents/GitHub/ImageFiltering.jl/demo.jl:31
For the existing approach, irfft
call takes size(B) = (49, 96)
and returns a (96, 96)
matrix
For the new approach with buffered_planned_irfft
RFFT.RCpair{T}(undef, size(a))
takes size(a) = (96, 96)
and creates a buffer with (49, 96)
which means the output of its lambda is a (49, 96)
array, where the remainder of _imfilter_fft!
is expecting a (96, 96)
matrix.
https://github.com/JuliaImages/ImageFiltering.jl/blob/a6a3cce826b891d979cee3668fb7177f52683e45/src/imfilter.jl#L851-L858
If this is clear to anyone, guidance would be appreciated. Otherwise I'll try to change RFFT.RCpair
to accomodate, but I don't understand that constructor.
Comparing buffered vs standard I get different values here https://github.com/JuliaImages/ImageFiltering.jl/blob/78044e7f3b85cde221fe41cd02f6a97e1af135fc/src/imfilter.jl#L875-L877
B[1:4] = ComplexF32[3065.7727f0 + 113.279686f0im, -254.48924f0 - 85.513504f0im, -369.77383f0 - 146.0826f0im, -267.98865f0 - 201.43063f0im]
_B[1:4] = ComplexF32[4595.6045f0 + 0.0f0im, 3.6047888f0 + 15.441383f0im, 14.378552f0 - 19.552792f0im, -18.684448f0 - 19.284311f0im]
My trial and error phase of debugging isn't being very fruitful, so I think I'll wait for review. @timholy if you happen to have time
Commeting to remind myself to take a look
(I mentioned to Tim on slack already in response to the question above that this is all non-threaded)
I'm stuck.
So this sanity check evaluates true
using RFFT, FFTW
a = rand(Float64, 10, 10)
out1 = rfft(a)
buf = RFFT.RCpair{Float64}(undef, size(a))
rfft_plan = RFFT.plan_rfft!(buf)
copy!(buf, a)
out2 = complex(rfft_plan(buf))
out1 ≈ out2
however, for equal krn
inputs (validated through typeof and sum equality) these lines give different results
┌ Info: good one
│ (rfft(krn))[1:3] =
│ 3-element Vector{ComplexF64}:
│ -0.00014399948754237335 + 0.0im
│ -0.004414546165525474 + 1.4913814981424873e-17im
└ -0.017117085642739213 + 8.853611881175574e-18im
┌ Info: bad one 1
│ (complex(planned_rfft2(AbstractFFTs.realfloat(krn))))[1:3] =
│ 3-element Vector{ComplexF64}:
│ -0.00014399948754237335 + 0.0im
│ -0.004405094283318256 - 0.0002887251333562888im
└ -0.016970646602189325 - 0.0022342280108512223im
where planned_rfft1
is generated from
https://github.com/JuliaImages/ImageFiltering.jl/blob/a79e8078d3adc4481db3b66ce5990b390bddecc7/src/imfilter.jl#L845-L852
Attention: Patch coverage is 15.00000%
with 34 lines
in your changes are missing coverage. Please review.
Project coverage is 89.78%. Comparing base (
66cf9d9
) to head (6ef477e
).:exclamation: Current head 6ef477e differs from pull request most recent head 9db8caf. Consider uploading reports for the commit 9db8caf to get more accurate results
Files | Patch % | Lines |
---|---|---|
src/imfilter.jl | 10.52% | 34 Missing :warning: |
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
Ok. Tests are passing here.
Note that planned_fft
doesn't support non-float eltypes currently because RFFT.RCpair
only supports floats (and other reasons, I guess).
However, demo.jl
still fails
f1[1:4] = [0.11624783622854669, 0.15113825707056688, 0.16887924282352731, 0.20249257393260642]
f2[1:4] = [0.17744583830964283, -0.0655415564571991, -0.10802786527654895, 0.11185075216343188]
ERROR: LoadError: f1 !≈ f2
Requires
TODO:
Algorithm.FFT(img, kernel, border)
?Because they are not threadsafe FFTW FFT plans are behind a lock. They should be reusable for same-sized images and kernels, so this is WIP towards exposing a way to provide plans, so that
imfilter!
can be run concurrently on julia threads without lock conflict.Note that this is using https://github.com/JuliaLang/julia/pull/52883 to count conflicts.
There's a demo temporarily in this PR.
Before it hits 34255 lock conflicts
This PR removes all but the lock conflicts from the singlular plans, which could be moved out of the for loop