Closed jsenn closed 3 years ago
Thank you @jsenn for the PR! I'd would like to suggest a few changes before I start a review on the GitHub system. I will use your nice implementation shared in #57:
function hausdorff(A::BitArray{2}, B::BitArray{2})
dtA = distance_transform(feature_transform(A))
dtB = distance_transform(feature_transform(B))
dAB = maximum(A .* dtB)
dBA = maximum(B .* dtA)
return max(dAB, dBA)
end
We see that this implementation starts with a preprocessing step feature_transform |> distance_transform
. Then the result of this preprocessing step is combined with the original input A
and B
to produce the one-way Hausdorff distances. Based on this observation, I would like to suggest the following:
img2pset
function by hausdorff_transform = feature_transform |> distance_transform
evaluate_pset
by the combination step above. Maybe call this function hausdorff_evaluate(d, A, B, dtA, dtB)
img2pset
are replaced by hausdorff_transform
and occurrences of evaluate_pset
are replaced by the hausdorff_evaluate
.The reason why there is additional logic is because we can save time by executing the preprocessing step once before the loops in pairwise
and colwise
for example. Does it make sense to you? Please let me know if something is not clear.
Also notice that the implementation of this new hausdorff_evaluate
will rely on d.inner_op
instead of the maximum
in your code snippet above.
This PR will lead to a tremendous improvement in performance from some quick tests on the REPL 🚀 :)
@juliohm I made the changes you suggested. Let me know if there's something that I missed!
@juliohm thanks for the comments. I made the changes, and the tests still pass on my machine.
That is great @jsenn , thanks for incorporating the suggestions. Thank you @johnnychen94 for the helpful comments.
I think we only miss the benchmarks now. Can you share some examples with a couple of sparse and dense binary images before and after the change?
@juliohm As mentioned in the issue (#56), the original implementation runs out of memory on my machine for even a small image (640x480) with a density of 0.5. It takes the original as long to diff 2 images at 125x125 resolution as it takes the new one to diff 2 4000x2000 images, again at 0.5 density.
I used the following test code to look at other densities:
using ImageDistances, Random, BenchmarkTools
make_test_img(w, h, density) = map((x -> x > density ? 0 : 1), rand(w, h))
function do_test(w, h, density)
imgA = make_test_img(w, h, density)
imgB = make_test_img(w, h, density)
stats = @benchmark hausdorff($imgA, $imgB)
mean(stats).time / 1e6
end
function do_tests(resolutions, densities)
# warm up the function
do_test(100, 100, 0.5)
results = zeros(length(resolutions), length(densities))
for res_idx = 1:length(resolutions)
res = resolutions[res_idx]
for dens_idx = 1:length(densities)
dens = densities[dens_idx]
try
avg_time = do_test(res[1], res[2], dens)
results[res_idx, dens_idx] = avg_time
catch e
if e isa OutOfMemoryError
results[res_idx, dens_idx] = Inf
else
display(e)
end
end
end
end
display(results)
end
resolutions = [(100,100), (640,480), (1028,768), (3840, 2160)]
densities = [0.0, 0.0001, 0.001, 0.01, 0.1, 0.5, 1.0]
do_tests(resolutions, densities)
Below are some tables of results. The rows of the table represent image dimensions, and the columns various settings for the density, from 0 (empty image) to 1 (all active pixels).
Here's the table output from the original algorithm (times are given in milliseconds): Â | 0 | 0.0001 | 0.001 | 0.01 | 0.1 | 0.5 | 1 |
---|---|---|---|---|---|---|---|
100x100 | 0.0669086 | 0.0529377 | 0.0519164 | 0.189756 | 14.9962 | 457.341 | 1.23418 |
640x480 | 2.09205 | 1.46759 | 2.38225 | 157.231 | 45218.6 | Inf | 28.9543 |
1028x768 | 5.36087 | 3.77945 | 11.7178 | 1141.55 | Inf | Inf | 66.5488 |
3840x2160 | 51.5246 | 44.911 | 1251.15 | Inf | Inf | Inf | 887.691 |
Inf values indicate that the program ran out of memory.
And here's the table from the new algorithm: Â | 0 | 0.0001 | 0.001 | 0.01 | 0.1 | 0.5 | 1 |
---|---|---|---|---|---|---|---|
100x100 | 0.331425 | 0.327105 | 0.498366 | 0.781181 | 1.14899 | 1.55493 | 0.852574 |
640x480 | 10.1094 | 16.3752 | 24.7114 | 49.5895 | 43.6246 | 54.8624 | 34.5993 |
1028x768 | 24.6157 | 42.526 | 72.8077 | 96.6227 | 111.602 | 140.856 | 85.7321 |
3840x2160 | 439.393 | 984.964 | 1746.75 | 1756.3 | 1799.32 | 2081.8 | 1640.28 |
Finally, here's a table where each cell is new / old. It shows the improvement/regression of new over old (close to 0 indicates significant improvement, 1 indicates same performance, >1 indicates regression): Â | 0 | 0.0001 | 0.001 | 0.01 | 0.1 | 0.5 | 1 |
---|---|---|---|---|---|---|---|
100x100 | 4.953 | 6.179 | 9.599 | 4.117 | 0.077 | 0.003 | 0.691 |
640x480 | 4.832 | 11.158 | 10.373 | 0.315 | 0.001 | #VALUE! | 1.195 |
1028x768 | 4.592 | 11.252 | 6.213 | 0.085 | #VALUE! | #VALUE! | 1.288 |
3840x2160 | 8.528 | 21.931 | 1.396 | #VALUE! | #VALUE! | #VALUE! | 1.848 |
Some notes:
hausdorff
.If the regressions are a problem, it might be enough to just special-case images with density < 0.01, falling back on the old algorithm. Special-casing empty and full images might also be a good idea, since the distance from any empty image
@jsenn am I seeing something incorrectly? Why the new algorithm is consistently slower than the old? There are very few entries in the table with values smaller than 1?
Appreciate if you can clarify because this table doesn't justify the PR, it actually goes against it.
@juliohm note the columns in the table: most of the entries are for very sparse images. If even 1% of the pixels in the image are active, the new algorithm vastly outperforms the previous one (for example, it is 1,000x faster for 640x480 images with 10% of pixels active). For that matter, the distance transform algorithm is infinitely faster than the previous one for images with moderate to high densities, or for large images, because the previous algorithm runs out of memory. (EDIT: these cases where the original algorithm doesn't return a result show up in the comparison table as #VALUE!
entries).
I intentionally chose to test mostly sparse images in order to show the weaknesses of the new implementation, because the strengths were clear to me:
D
matrix to avoid running out of memory.Here are some simple improvements that will take care of some of the performance regressions:
distance_transform
is applied. This will take care of the 0- and 1-density columns (empty and completely full images).The only thing I'm not sure about is how to handle sparse but nonempty images. One idea would be to just add separate hausdorff_sparse
and modified_hausdorff_sparse
functions that use the old algorithm. Another would be to add some kind of heuristic to choose whether to use the distance_transform algorithm or the one that compares all the points. The extremely steep cutoff in performance at 1% density suggests a way to do this: simply have a preprocessing step that calculates the images' densities, and if it's less than 1%, use the old algorithm. Otherwise, use the new one.
also, I think the absolute differences are important. For example, the original algorithm is 10x faster for 640x480 images with 0.1% density. However, that represents a time savings of only about 20 milliseconds. In fact, the greatest time saving you get from using the original algorithm is about 1 second, and that's in a case that's easily optimized in the new implementation (diffing 2 4k images with all 1s).
Where the new algorithm outperforms the old, however, the time savings are often measured in seconds, minutes, and, again, infinities (of course, if you have tons of memory, those infinities may become hours).
A really great benchmark and explanation. I'm good with the benchmark result given that most "actual" images lie in the density range (0.01-1) where the new algorithm is faster.
julia> using TestImages, ImageBinarization
julia> densities = map(TestImages.remotefiles) do file
try
img = RGB.(testimage(file))
out = binarize(img, Otsu())
return sum(x->x==1, out)/length(out)
catch
return missing
end
end
julia> describe(DataFrame(densities=densities))
1×7 DataFrame
Row │ variable mean min median max nmissing eltype
│ Symbol Float64 Float64 Float64 Float64 Int64 Union
─────┼───────────────────────────────────────────────────────────────────────────────────────
1 │ densities 0.448454 0.0105944 0.500463 0.768402 1 Union{Missing, Float64}
binarize(img, Otsu())
might be too brutal in real-world image processing, but it does give a hint here.
It would be nice if there's a better way to fix the performance regression in this PR, though, I'm okay to merge this first and fix it later as slower is definitely better than unaccessible, i.e., OutOfMemoryError
.
This is just my personal opinion, I'll leave it to @juliohm to decide if it's mergeable.
Thank you guys for the detailed benchmark results and tests, I will just do a quick check with a set of images I did use when I wrote this original version of the code to make sure there is no major regression as suggested by the test images and random images above. I am also in favor to merge, just a quick check to see if existing users will feel any major problems.
Oh yeah, everything is working nicely with the images locally too. Amazing work @jsenn ! 💯
Feel free to trigger a new patch release @johnnychen94 if you feel it is a good time.
@juliohm @johnnychen94 thanks for the careful reviews! @johnnychen94 I didn't know about TestImages--that would have made a more interesting benchmark than the random images (though as you note it wouldn't have shown the weaknesses of the new algorithm).
Previously, the Hausdorff routines used time and memory that was quadratic in the number of active pixels in the input images. This made them impractical for large or dense images (see #56).
Here, we leverage the
distance_transform
function fromImageMorphology
to compute the Hausdorff distance in linear time.Limitations
ImageMorphology
.I'm new to Julia, so I apologize in advance for any mistakes in this PR. I'd appreciate any comments or criticisms you may have!