JuliaLang / julia

The Julia Programming Language
https://julialang.org/
MIT License
45.77k stars 5.49k forks source link

Making `rand` for a `Base.KeySet` faster #51605

Closed Tortar closed 1 year ago

Tortar commented 1 year ago

Consider this simple benchmark

using BenchmarkTools, Random

d = Dict(x => x for x in 1:1000);
rng = Xoshiro(42)

@benchmark rand($rng, keys($d))

on my computer under 1.9.3, it returns

julia> @benchmark rand($rng, keys($d))
BenchmarkTools.Trial: 10000 samples with 10 evaluations.
 Range (min … max):  510.000 ns …  43.100 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):       1.530 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):     1.565 μs ± 562.739 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

                   ▁▂▄▂▅▇▇▇█▇▅█▇▅▄▅ ▁▁
  ▁▁▁▁▁▁▁▂▂▂▃▄▄▃▆▆▇███████████████████▇▆▅▄▃▃▃▂▂▂▂▁▂▁▁▁▁▁▁▁▁▁▁▁▁ ▄
  510 ns           Histogram: frequency by time         2.86 μs <

 Memory estimate: 0 bytes, allocs estimate: 0.

It sounded a bit slow (and more the number of elements gets bigger more it is so) then I tried

function Base.rand(rng::AbstractRNG, keys_d::Base.KeySet) 
    return Base.rand(rng, keys_d.dict).first
end
@benchmark rand($rng, keys($d))

which gave me

julia> @benchmark rand($rng, keys($d))
BenchmarkTools.Trial: 10000 samples with 994 evaluations.
 Range (min … max):  34.004 ns … 113.380 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     36.217 ns               ┊ GC (median):    0.00%
 Time  (mean ± σ):   36.632 ns ±   3.040 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

        ▃▂█▃▂▂
  ▂▂▂▃▅▅██████▅▄▃▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▁▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▁▂▂ ▃
  34 ns           Histogram: frequency by time         48.8 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

Can't something similar be used instead?

In general I think that the current method iterates over the KeySet for finding a random value, while I think that the new one requires almost just a constant cost

oscardssmith commented 1 year ago

Specifically, what needs to be done here is to add few lines around Random/src/generation.jl:440 to add sampling for KeySet (and potentially ValueIterator)

Tortar commented 1 year ago

working on it! :-)