Closed johnnychen94 closed 2 years ago
Merging #233 (a7fc3c6) into master (1bbf666) will increase coverage by
0.25%
. The diff coverage is100.00%
.
@@ Coverage Diff @@
## master #233 +/- ##
==========================================
+ Coverage 91.89% 92.14% +0.25%
==========================================
Files 11 12 +1
Lines 1603 1642 +39
==========================================
+ Hits 1473 1513 +40
+ Misses 130 129 -1
Impacted Files | Coverage Δ | |
---|---|---|
src/ImageFiltering.jl | 68.75% <ø> (ø) |
|
src/models.jl | 100.00% <100.00%> (ø) |
|
src/border.jl | 93.47% <0.00%> (+0.54%) |
:arrow_up: |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact)
,ø = not affected
,? = missing data
Powered by Codecov. Last update 1bbf666...a7fc3c6. Read the comment docs.
Benchmark on a9ed869:
using ImageFiltering
using ImageBase
using ImageFiltering.Models
using TestImages
using Random
using CUDA
CUDA.allowscalar(false)
# 2d Gray
img = testimage("cameraman");
img_noisy = img .+ 0.1f0*randn(MersenneTwister(0), Float32, size(img));
@btime solve_ROF_PD($img_noisy, 0.2, 30);
# after: 59.914 ms (134 allocations: 67.00 MiB)
# before: 101.396 ms (134 allocations: 134.00 MiB)
img_noisy_cu = CuArray(Float32.(img_noisy));
@btime solve_ROF_PD($img_noisy_cu, 0.2, 30); # GTX 3090
# after: 4.188 ms (8793 allocations: 722.03 KiB)
# before: 4.060 ms (8793 allocations: 754.84 KiB)
# 2d RGB
img = testimage("lighthouse");
img_noisy = img .+ colorview(RGB, ntuple(i->0.05.*randn(MersenneTwister(i), size(img)), 3)...);
@btime solve_ROF_PD($img_noisy, 0.2, 30);
# after: 312.251 ms (136 allocations: 303.00 MiB)
# before: 610.880 ms (134 allocations: 597.00 MiB)
img_noisy_cu = CuArray(float32.(img_noisy))
@btime solve_ROF_PD($img_noisy_cu, 0.2, 30); # GTX 3090
# after: 5.381 ms (8433 allocations: 699.53 KiB)
# before: 7.176 ms (8433 allocations: 729.53 KiB)
I believe the implementation can be faster on both CPU and GPU, but the higher priority is to replace Images.imROF
so I don't plan to investigate the performance anymore in this PR. This is already satisfying.
FWIW, the MATLAB version of the same algorithm used in our lab is about 50ms for a 2d gray image.
@timholy Since this implementation is already much better than the old imROF
, I plan to merge this in a few days and deprecate Images.imROF
. I know you're also busy these days but it would be great if you can review this.
I might add more model solvers in the near future so I want to make sure that I'm adding codes to the right place.
That CPU/GPU difference is impressive. WRT the Matlab comparison, does it change if you use -singleCompThread
? That might tell us where the difference lies.
But agreed, further performance optimization can happen later. Thanks so much for doing this, and congrats on such a huge improvement!!
That CPU/GPU difference is impressive. WRT the Matlab comparison, does it change if you use -singleCompThread? That might tell us where the difference lies.
I believe when https://github.com/JuliaImages/ImageBase.jl/issues/25 is done we'd just be much better.
I also introduced an in-place version of it because there are some algorithms built on top of denoisers. For instance, plug-and-play https://engineering.purdue.edu/ChanGroup/project_PnP.html
The in-place version reduces the CPU runtime from ~55ms to ~40ms so it's definitely worth doing. ~(I didn't check the GPU runtime, though).~ GPU runtime is the same.
Plan to merge tomorrow unless we get more comments in.
Moved from https://github.com/JuliaImages/ImageBase.jl/pull/24
The test codes for CUDA are maintained in an independent folder
test/cuda
with its own set ofProject.toml
. We don't have CI setup so I manually test it in my local machine. My plan is to set up GPU CI in Images.jl only after we finish https://github.com/JuliaImages/Images.jl/issues/898, and use a script to collect all distributed CUDA-only tests in all packages.TODO: