Open Moelf opened 6 months ago
does this translate to a speedup in the search function as well? Some things I've done which speedup the hamming_distance function don't seem to make a difference when doing the actual search
julia> q1 = SVector{8, UInt64}(rand(UInt64,8));
julia> q2 = SVector{8, UInt64}(rand(UInt64,8));
julia> @be hamming_distance($q1, $q2)
Benchmark: 5613 samples with 5887 evaluations
min 2.484 ns
median 2.605 ns
mean 2.841 ns
max 7.899 ns
julia> @code_llvm hamming_distance(q1, q2)
; Function Signature: hamming_distance(StaticArraysCore.SArray{Tuple{8}, UInt64, 1, 8}, StaticArraysCore.SArray{Tuple{8}, UInt64, 1, 8})
; @ /Users/lunaticd/code/tiny-binary-rag/rag.jl:28 within `hamming_distance`
define i64 @julia_hamming_distance_12257(ptr nocapture noundef nonnull readonly align 8 dereferenceable(64) %"x1::SArray", ptr nocapture noundef nonnull readonly align 8 dereferenceable(64) %"x2::SArray") #0 {
top:
; @ /Users/lunaticd/code/tiny-binary-rag/rag.jl:33 within `hamming_distance`
; ┌ @ simdloop.jl:77 within `macro expansion` @ /Users/lunaticd/code/tiny-binary-rag/rag.jl:34
; │┌ @ /Users/lunaticd/code/tiny-binary-rag/rag.jl:14 within `hamming_distance`
; ││┌ @ int.jl:373 within `xor`
%0 = load <8 x i64>, ptr %"x1::SArray", align 8
%1 = load <8 x i64>, ptr %"x2::SArray", align 8
%2 = xor <8 x i64> %1, %0
; ││└
; ││┌ @ int.jl:415 within `count_ones`
%3 = call <8 x i64> @llvm.ctpop.v8i64(<8 x i64> %2)
; │└└
; │┌ @ int.jl:87 within `+`
%4 = call i64 @llvm.vector.reduce.add.v8i64(<8 x i64> %3)
; └└
; @ /Users/lunaticd/code/tiny-binary-rag/rag.jl:36 within `hamming_distance`
ret i64 %4
}
It's not going to get better than this.
even if each comparison was 1ns the best over 1M rows would be 1ms, which it's already very close to
julia> @be k_closest($X1, $q1, 100)
Benchmark: 28 samples with 1 evaluation
min 3.207 ms (5 allocs: 3.281 KiB)
median 3.433 ms (5 allocs: 3.281 KiB)
mean 3.562 ms (5 allocs: 3.281 KiB)
max 4.749 ms (5 allocs: 3.281 KiB)
julia> a = rand(UInt64, 8)
8-element Vector{UInt64}:
0xb08159b400b994e8
0x719aafe1251df126
0x4be73d16ff742a65
0x9af6c257b0e17a28
0x5c8128573f810275
0x6cb73fed5ee64e79
0x57b7c96fb9093e66
0x55d6863d8cd7de69
julia> @be k_closest($X1, $a, 100)
Benchmark: 21 samples with 1 evaluation
min 4.425 ms (5 allocs: 3.281 KiB)
median 4.511 ms (5 allocs: 3.281 KiB)
mean 4.639 ms (5 allocs: 3.281 KiB)
max 5.490 ms (5 allocs: 3.281 KiB)
the best combo seems to be a StaticArray query vector and a matrix as the DB
julia> @be k_closest_parallel($X1, $q1, 100)
Benchmark: 83 samples with 1 evaluation
min 1.099 ms (58 allocs: 23.906 KiB)
median 1.117 ms (58 allocs: 23.906 KiB)
mean 1.180 ms (58 allocs: 23.906 KiB)
max 2.570 ms (58 allocs: 23.906 KiB)
btw, there are a lot of unnecessary annotations, for example,
count_ones
always returnInt
no matter what already:julia> Base.url(@which count_ones(0xf2)) "https://github.com/JuliaLang/julia/tree/0b4590a5507d3f3046e5bafc007cacbbfc9b310b/base/int.jl#L415"