JuliaImages / ImageFiltering.jl

Julia implementations of multidimensional array convolution and nonlinear stencil operations
Other
99 stars 51 forks source link

implement generic ROF model using Chambolle04 primal-dual method #233

Closed johnnychen94 closed 2 years ago

johnnychen94 commented 2 years ago

Moved from https://github.com/JuliaImages/ImageBase.jl/pull/24

The test codes for CUDA are maintained in an independent folder test/cuda with its own set of Project.toml. We don't have CI setup so I manually test it in my local machine. My plan is to set up GPU CI in Images.jl only after we finish https://github.com/JuliaImages/Images.jl/issues/898, and use a script to collect all distributed CUDA-only tests in all packages.

(ImageFiltering) pkg> activate test/cuda

julia> include("test/cuda/runtests.jl")
Test Summary:  | Pass  Total
ImageFiltering |   28     28

TODO:

codecov[bot] commented 2 years ago

Codecov Report

Merging #233 (a7fc3c6) into master (1bbf666) will increase coverage by 0.25%. The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #233      +/-   ##
==========================================
+ Coverage   91.89%   92.14%   +0.25%     
==========================================
  Files          11       12       +1     
  Lines        1603     1642      +39     
==========================================
+ Hits         1473     1513      +40     
+ Misses        130      129       -1     
Impacted Files Coverage Δ
src/ImageFiltering.jl 68.75% <ø> (ø)
src/models.jl 100.00% <100.00%> (ø)
src/border.jl 93.47% <0.00%> (+0.54%) :arrow_up:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 1bbf666...a7fc3c6. Read the comment docs.

johnnychen94 commented 2 years ago

Benchmark on a9ed869:

using ImageFiltering
using ImageBase
using ImageFiltering.Models
using TestImages
using Random

using CUDA
CUDA.allowscalar(false)

# 2d Gray
img = testimage("cameraman");
img_noisy = img .+ 0.1f0*randn(MersenneTwister(0), Float32, size(img));
@btime solve_ROF_PD($img_noisy, 0.2, 30);
# after: 59.914 ms (134 allocations: 67.00 MiB)
# before: 101.396 ms (134 allocations: 134.00 MiB)

img_noisy_cu = CuArray(Float32.(img_noisy));
@btime solve_ROF_PD($img_noisy_cu, 0.2, 30); # GTX 3090
# after: 4.188 ms (8793 allocations: 722.03 KiB)
# before: 4.060 ms (8793 allocations: 754.84 KiB)

# 2d RGB
img = testimage("lighthouse");
img_noisy = img .+ colorview(RGB, ntuple(i->0.05.*randn(MersenneTwister(i), size(img)), 3)...);
@btime solve_ROF_PD($img_noisy, 0.2, 30);
# after: 312.251 ms (136 allocations: 303.00 MiB)
# before: 610.880 ms (134 allocations: 597.00 MiB)

img_noisy_cu = CuArray(float32.(img_noisy))
@btime solve_ROF_PD($img_noisy_cu, 0.2, 30); # GTX 3090
# after: 5.381 ms (8433 allocations: 699.53 KiB)
# before: 7.176 ms (8433 allocations: 729.53 KiB)

I believe the implementation can be faster on both CPU and GPU, but the higher priority is to replace Images.imROF so I don't plan to investigate the performance anymore in this PR. This is already satisfying.


FWIW, the MATLAB version of the same algorithm used in our lab is about 50ms for a 2d gray image.

johnnychen94 commented 2 years ago

@timholy Since this implementation is already much better than the old imROF, I plan to merge this in a few days and deprecate Images.imROF. I know you're also busy these days but it would be great if you can review this.

I might add more model solvers in the near future so I want to make sure that I'm adding codes to the right place.

timholy commented 2 years ago

That CPU/GPU difference is impressive. WRT the Matlab comparison, does it change if you use -singleCompThread? That might tell us where the difference lies.

timholy commented 2 years ago

But agreed, further performance optimization can happen later. Thanks so much for doing this, and congrats on such a huge improvement!!

johnnychen94 commented 2 years ago

That CPU/GPU difference is impressive. WRT the Matlab comparison, does it change if you use -singleCompThread? That might tell us where the difference lies.

I believe when https://github.com/JuliaImages/ImageBase.jl/issues/25 is done we'd just be much better.


I also introduced an in-place version of it because there are some algorithms built on top of denoisers. For instance, plug-and-play https://engineering.purdue.edu/ChanGroup/project_PnP.html

The in-place version reduces the CPU runtime from ~55ms to ~40ms so it's definitely worth doing. ~(I didn't check the GPU runtime, though).~ GPU runtime is the same.

Plan to merge tomorrow unless we get more comments in.