JuliaStats / Distances.jl

A Julia package for evaluating distances (metrics) between vectors.
Other
419 stars 96 forks source link

3-argument dot method #173

Open devmotion opened 3 years ago

devmotion commented 3 years ago

https://github.com/JuliaStats/Distances.jl/blob/44036b573ec85287022f4368c4e1e279698bd031/src/mahalanobis.jl#L23 and probably also dot_percol could benefit from the 3-argument method dot(x, A, y). However, since this version was only introduced in Julia 1.4, Compat >= 3.2.0 would be required to support earlier Julia versions.

Of course, this would increase loading times on Julia < 1.4 (by hiding the import behind a @static if VERSION < v"1.4.0-DEV.92" check it should be possible to avoid this with recent Julia versions, I assume). As an example, on my computer (with Julia 1.5 though) I get on master

julia> @time using Distances
  0.040426 seconds (32.08 k allocations: 2.444 MiB)

whereas with an additional using Compat I get

julia> @time using Distances
  0.055971 seconds (63.66 k allocations: 4.458 MiB)

Would it still be OK to add Compat as a dependency?

Alternatively, one could maybe define and use internally

function dot3args(x, A, y)
    @static if VERSION < v"1.4.0-DEV.92"
        dot(x, A * y)
    else
        dot(x, A, y)
    end
end

which would just fall back to the current implementation for older Julia versions (in contrast to the implementation in https://github.com/JuliaLang/Compat.jl/blob/773db6a10c299a9af68eae5bf0f238117517bd1b/src/Compat.jl#L127-L144 that avoids the intermediate creation of an array).

dkarrasch commented 3 years ago

IIRC, the 3-arg dot is slower than using BLAS twice (which I just confirmed with a quick benchmark for Symmetric matrices). Its main purpose is to avoid intermediate allocations. Or do you have concrete benchmarks that show the opposite?

devmotion commented 3 years ago

I get quite different benchmark results, for the example in Distances, the 3-argument version seems to be faster (and, of course, avoid the allocation):

julia> using LinearAlgebra, BenchmarkTools

julia> z = rand(5);

julia> Q = rand(5, 5);

julia> f(z, Q) = dot(z, Q * z)
f (generic function with 1 method)

julia> g(z, Q) = dot(z, Q, z)
g (generic function with 1 method)

julia> @btime f($z, $Q);
  90.175 ns (1 allocation: 128 bytes)

julia> @btime g($z, $Q);
  28.354 ns (0 allocations: 0 bytes)

julia> z = rand(100);

julia> Q = rand(100, 100);

julia> @btime f($z, $Q);
  4.842 μs (1 allocation: 896 bytes)

julia> @btime g($z, $Q);
  1.670 μs (0 allocations: 0 bytes)
dkarrasch commented 3 years ago

Oh, I was benchmarking large matrices (dimension several hundreds). I wasn't sure what the typical use case for Mahalanobis is.

devmotion commented 3 years ago

Yes, I can confirm that at some point the current implementation becomes faster. Would it make sense to add an empirical cutoff in the dot3args function suggested above?

devmotion commented 3 years ago

Although it sounds difficult to come up with good threshold values for arbitrary vector and matrix (not possible currently, but I guess the type constraints of SqMahalanobis could be relaxed?) types.