Optimized Hausdorff and ModifiedHausdorff algorithms (fixes #56).

jsenn commented 3 years ago

Previously, the Hausdorff routines used time and memory that was quadratic in the number of active pixels in the input images. This made them impractical for large or dense images (see #56).

Here, we leverage the distance_transform function from ImageMorphology to compute the Hausdorff distance in linear time.

Limitations

Adds a dependency on ImageMorphology.
Because the new algorithm operates on a dense representation of the image, it may perform worse for very large, sparse images.
Unlike the old algorithm, this one can't be generalized to non-images.

I'm new to Julia, so I apologize in advance for any mistakes in this PR. I'd appreciate any comments or criticisms you may have!

juliohm commented 3 years ago

Thank you @jsenn for the PR! I'd would like to suggest a few changes before I start a review on the GitHub system. I will use your nice implementation shared in #57:

function hausdorff(A::BitArray{2}, B::BitArray{2})
    dtA = distance_transform(feature_transform(A))
    dtB = distance_transform(feature_transform(B))
    dAB = maximum(A .* dtB)
    dBA = maximum(B .* dtA)
    return max(dAB, dBA)
end

We see that this implementation starts with a preprocessing step feature_transform |> distance_transform. Then the result of this preprocessing step is combined with the original input A and B to produce the one-way Hausdorff distances. Based on this observation, I would like to suggest the following:

Replace the img2pset function by hausdorff_transform = feature_transform |> distance_transform
Replace evaluate_pset by the combination step above. Maybe call this function hausdorff_evaluate(d, A, B, dtA, dtB)
Rewrite the rest of the logic so that occurrences of img2pset are replaced by hausdorff_transform and occurrences of evaluate_pset are replaced by the hausdorff_evaluate.

The reason why there is additional logic is because we can save time by executing the preprocessing step once before the loops in pairwise and colwise for example. Does it make sense to you? Please let me know if something is not clear.

Also notice that the implementation of this new hausdorff_evaluate will rely on d.inner_op instead of the maximum in your code snippet above.

This PR will lead to a tremendous improvement in performance from some quick tests on the REPL 🚀 :)

jsenn commented 3 years ago

@juliohm I made the changes you suggested. Let me know if there's something that I missed!

jsenn commented 3 years ago

@juliohm thanks for the comments. I made the changes, and the tests still pass on my machine.

juliohm commented 3 years ago

That is great @jsenn , thanks for incorporating the suggestions. Thank you @johnnychen94 for the helpful comments.

I think we only miss the benchmarks now. Can you share some examples with a couple of sparse and dense binary images before and after the change?

jsenn commented 3 years ago

@juliohm As mentioned in the issue (#56), the original implementation runs out of memory on my machine for even a small image (640x480) with a density of 0.5. It takes the original as long to diff 2 images at 125x125 resolution as it takes the new one to diff 2 4000x2000 images, again at 0.5 density.

I used the following test code to look at other densities:

using ImageDistances, Random, BenchmarkTools

make_test_img(w, h, density) = map((x -> x > density ? 0 : 1), rand(w, h))

function do_test(w, h, density)
    imgA = make_test_img(w, h, density)
    imgB = make_test_img(w, h, density)
    stats = @benchmark hausdorff($imgA, $imgB)
    mean(stats).time / 1e6
end

function do_tests(resolutions, densities)
    # warm up the function
    do_test(100, 100, 0.5)
    results = zeros(length(resolutions), length(densities))
    for res_idx = 1:length(resolutions)
        res = resolutions[res_idx]
        for dens_idx = 1:length(densities)
            dens = densities[dens_idx]
            try
                avg_time = do_test(res[1], res[2], dens)
                results[res_idx, dens_idx] = avg_time
            catch e
                if e isa OutOfMemoryError
                    results[res_idx, dens_idx] = Inf
                else
                    display(e)
                end
            end
        end
    end
    display(results)
end

resolutions = [(100,100), (640,480), (1028,768), (3840, 2160)]
densities = [0.0, 0.0001, 0.001, 0.01, 0.1, 0.5, 1.0]
do_tests(resolutions, densities)

Below are some tables of results. The rows of the table represent image dimensions, and the columns various settings for the density, from 0 (empty image) to 1 (all active pixels).

Here's the table output from the original algorithm (times are given in milliseconds):	0	0.0001	0.001	0.01	0.1	0.5	1
100x100	0.0669086	0.0529377	0.0519164	0.189756	14.9962	457.341	1.23418
640x480	2.09205	1.46759	2.38225	157.231	45218.6	Inf	28.9543
1028x768	5.36087	3.77945	11.7178	1141.55	Inf	Inf	66.5488
3840x2160	51.5246	44.911	1251.15	Inf	Inf	Inf	887.691

Inf values indicate that the program ran out of memory.

And here's the table from the new algorithm:	0	0.0001	0.001	0.01	0.1	0.5	1
100x100	0.331425	0.327105	0.498366	0.781181	1.14899	1.55493	0.852574
640x480	10.1094	16.3752	24.7114	49.5895	43.6246	54.8624	34.5993
1028x768	24.6157	42.526	72.8077	96.6227	111.602	140.856	85.7321
3840x2160	439.393	984.964	1746.75	1756.3	1799.32	2081.8	1640.28

Finally, here's a table where each cell is new / old. It shows the improvement/regression of new over old (close to 0 indicates significant improvement, 1 indicates same performance, >1 indicates regression):	0	0.0001	0.001	0.01	0.1	0.5	1
100x100	4.953	6.179	9.599	4.117	0.077	0.003	0.691
640x480	4.832	11.158	10.373	0.315	0.001	#VALUE!	1.195
1028x768	4.592	11.252	6.213	0.085	#VALUE!	#VALUE!	1.288
3840x2160	8.528	21.931	1.396	#VALUE!	#VALUE!	#VALUE!	1.848

Some notes:

This benchmark used random images, which aren't representative of real ones, and I only tested hausdorff.
The #VALUE! entries correspond to the images where the original algorithm ran out of memory.
Performance is much worse for empty images, and extremely sparse images. For example, comparing 640x480 images where about 1 out of every 1,000 pixels is active (i.e., not even a single column's worth) is 10 times slower in the new algorithm. However, that drops off sharply at around 1% active pixels.
For images with density 1 (i.e., all pixels are active) there's a regression in performance. I suspect that this is because the check that images are equal is done after the distance transform has been applied, which is most of the work.

If the regressions are a problem, it might be enough to just special-case images with density < 0.01, falling back on the old algorithm. Special-casing empty and full images might also be a good idea, since the distance from any empty image

juliohm commented 3 years ago

@jsenn am I seeing something incorrectly? Why the new algorithm is consistently slower than the old? There are very few entries in the table with values smaller than 1?

Appreciate if you can clarify because this table doesn't justify the PR, it actually goes against it.

jsenn commented 3 years ago

@juliohm note the columns in the table: most of the entries are for very sparse images. If even 1% of the pixels in the image are active, the new algorithm vastly outperforms the previous one (for example, it is 1,000x faster for 640x480 images with 10% of pixels active). For that matter, the distance transform algorithm is infinitely faster than the previous one for images with moderate to high densities, or for large images, because the previous algorithm runs out of memory. (EDIT: these cases where the original algorithm doesn't return a result show up in the comparison table as #VALUE! entries).

I intentionally chose to test mostly sparse images in order to show the weaknesses of the new implementation, because the strengths were clear to me:

The previous implementation had quadratic runtime and space complexity, this one is linear in both
The previous implementation runs out of memory when diffing even small images (640x480) at moderate densities (0.5), while this new one can handle very large images with memory proportional to the size of the inputs.
There are clear paths to improving the performance of this new implementation by special-casing the types of images where it performs poorly, where the original implementation hits a wall for images of almost any size if more than 1% of their pixels are active, and improving it would require more work. That said, it wouldn't be impossible: using a spatial index for the distance queries would probably help improve on the existing implementation, as would using a lazy data structure for the D matrix to avoid running out of memory.

Here are some simple improvements that will take care of some of the performance regressions:

Move the fast check for equal images to the top level, before the distance_transform is applied. This will take care of the 0- and 1-density columns (empty and completely full images).
Add special case for when only 1 of the 2 input images is empty (then the distance is Inf). Similarly, if only 1 image is full (all 1s), we only need to perform 1 distance transform.

The only thing I'm not sure about is how to handle sparse but nonempty images. One idea would be to just add separate hausdorff_sparse and modified_hausdorff_sparse functions that use the old algorithm. Another would be to add some kind of heuristic to choose whether to use the distance_transform algorithm or the one that compares all the points. The extremely steep cutoff in performance at 1% density suggests a way to do this: simply have a preprocessing step that calculates the images' densities, and if it's less than 1%, use the old algorithm. Otherwise, use the new one.

jsenn commented 3 years ago

also, I think the absolute differences are important. For example, the original algorithm is 10x faster for 640x480 images with 0.1% density. However, that represents a time savings of only about 20 milliseconds. In fact, the greatest time saving you get from using the original algorithm is about 1 second, and that's in a case that's easily optimized in the new implementation (diffing 2 4k images with all 1s).

Where the new algorithm outperforms the old, however, the time savings are often measured in seconds, minutes, and, again, infinities (of course, if you have tons of memory, those infinities may become hours).

johnnychen94 commented 3 years ago

A really great benchmark and explanation. I'm good with the benchmark result given that most "actual" images lie in the density range (0.01-1) where the new algorithm is faster.

julia> using TestImages, ImageBinarization

julia> densities = map(TestImages.remotefiles) do file
       try
           img = RGB.(testimage(file))
           out = binarize(img, Otsu())
           return sum(x->x==1, out)/length(out)
       catch
           return missing
       end
   end

julia> describe(DataFrame(densities=densities))
1×7 DataFrame
 Row │ variable   mean      min        median    max       nmissing  eltype
     │ Symbol     Float64   Float64    Float64   Float64   Int64     Union
─────┼───────────────────────────────────────────────────────────────────────────────────────
   1 │ densities  0.448454  0.0105944  0.500463  0.768402         1  Union{Missing, Float64}

binarize(img, Otsu()) might be too brutal in real-world image processing, but it does give a hint here.

It would be nice if there's a better way to fix the performance regression in this PR, though, I'm okay to merge this first and fix it later as slower is definitely better than unaccessible, i.e., OutOfMemoryError.

This is just my personal opinion, I'll leave it to @juliohm to decide if it's mergeable.

juliohm commented 3 years ago

Thank you guys for the detailed benchmark results and tests, I will just do a quick check with a set of images I did use when I wrote this original version of the code to make sure there is no major regression as suggested by the test images and random images above. I am also in favor to merge, just a quick check to see if existing users will feel any major problems.

juliohm commented 3 years ago

Oh yeah, everything is working nicely with the images locally too. Amazing work @jsenn ! 💯

juliohm commented 3 years ago

Feel free to trigger a new patch release @johnnychen94 if you feel it is a good time.

jsenn commented 3 years ago

@juliohm @johnnychen94 thanks for the careful reviews! @johnnychen94 I didn't know about TestImages--that would have made a more interesting benchmark than the random images (though as you note it wouldn't have shown the weaknesses of the new algorithm).

JuliaImages / ImageDistances.jl

Optimized Hausdorff and ModifiedHausdorff algorithms (fixes #56). #57

Limitations