Closed pitdicker closed 6 years ago
Please don't look at it yet, I did things stupidly...
I have replaced the modulus in RangeInt
with a widening multiply, see https://lemire.me/blog/2016/06/27/a-fast-alternative-to-the-modulo-reduction/. The post does not describe how to turn this into a uniform distribution though, but comparing the low word to zone
does the trick.
Benchmarks (see the diff colums for the results with the time Xorshift takes removed):
x86:
baseline | old | new | old (diff) | new (diff) | percent | |
---|---|---|---|---|---|---|
i8 | 1466 | 5602 | 4224 | 4136 | 2758 | 66,7% |
i16 | 1471 | 5834 | 4062 | 4363 | 2591 | 59,4% |
i32 | 1471 | 6201 | 4709 | 4730 | 3238 | 68,5% |
i64 | 4751 | 12112 | 10562 | 7361 | 5811 | 78,9% |
i64 (i128_support) | 4751 | 12112 | 8530 | 7361 | 3779 | 51,3% |
x86_64:
baseline | old | new | old (diff) | new (diff) | percent | |
---|---|---|---|---|---|---|
i8 | 1380 | 4864 | 2683 | 3484 | 1303 | 37,4% |
i16 | 1379 | 4862 | 2862 | 3483 | 1483 | 42,6% |
i32 | 1381 | 5416 | 3390 | 4035 | 2009 | 49,8% |
i64 | 2650 | 11225 | 5895 | 8575 | 3245 | 37,8% |
i64 (i128_support) | 2650 | 11225 | 3226 | 8575 | 576 | 6,7% |
Like using a division, widening multiply depends more on the most significant bits than the least significant ones.
I have changed the tests for WeightedChoice
because they assume a modulus is used for the range reduction.
Great work, but I'm still wondering: do we want this? Is it a question between using an RNG like Xorshift with these changes or using another RNG which doesn't appear to have these weaknesses in the low bits? You should be able to answer this question better than anyone based on the research you've been doing lately, I guess.
Even when we pick an RNG that is good and does not need those changes, there would still be RNG's in other crates that would benefit from them.
The change to the ziggurat layer makes it little bit slower, 3~4%. I wonder if some indexing tricks can recover that... For bools I have just added a benchmark, it improves from 1,405 ns/iter to 1,030 ns/iter, 27% faster. And the range code is now about twice as fast.
So two of the three are definitely faster, and it helps with weaker RNG's. Seems win-win to me :smile:
The numbers you posted are ns/iter? Ah, ok. 😆 👍
Can you also benchmark constructing and sampling from a range? Single-use ranges are quite common, e.g. the code in the seq
module uses this heavily.
Results:
Before:
test gen_range_i16 ... bench: 8,537 ns/iter (+/- 583) = 234 MB/s
test gen_range_i32 ... bench: 9,105 ns/iter (+/- 54) = 439 MB/s
test gen_range_i64 ... bench: 17,442 ns/iter (+/- 410) = 458 MB/s
test gen_range_i8 ... bench: 8,521 ns/iter (+/- 146) = 117 MB/s
After:
test gen_range_i16 ... bench: 6,996 ns/iter (+/- 206) = 285 MB/s
test gen_range_i32 ... bench: 6,835 ns/iter (+/- 16) = 585 MB/s
test gen_range_i64 ... bench: 13,603 ns/iter (+/- 189) = 588 MB/s
test gen_range_i8 ... bench: 6,563 ns/iter (+/- 9) = 152 MB/s
So there is some improvement, but they are both terrible... Maybe it is worth thinking about a single-use range?
gen_range
is a single-use range so it doesn't need API adjustment. But is any optimisation possible anyway?
I think it is possible to get within 70% of the normal version.
If we don't search for the optimal zone with a modulus but pick one with bitshifts that should be a win. On average 75% of the RNG's results should still be acceptable. And with everything in one function it can maybe optimise a bit better.
Okay, shall I go ahead and merge?
It looks like the CI failure is something unrelated I messed up (seeding-minimal branch I think).
I can't figure out how to shuffle the traits to get something like sample_single
in RangeImpl
. Can you help me with it? https://gist.github.com/pitdicker/ab086693964a71c031d63b34ccfb8dba
Feel free to merge this though.
I think you want X::T
:
pub fn sample_single<X: SampleRange, R: Rng+?Sized>(low: X, high: X, rng: &mut R) -> X {
X::T::sample_single(low, high, rng)
}
Thank you, that could have taken me hours!
No problem! It was a bit of a weird error and something you wonder why Rustc didn't figure out.
This implements https://github.com/dhardy/rand/issues/52#issuecomment-345014749.
Many small RNG's are of lower quality in their least significant bits. For example LCG's with a power of two modulus, MCG's, Xorshift+ and Xoroshiro+.
If it costs us nothing to avoid the least significant bits it seems sensible to me to do so. Even if none of these RNG's becomes part of rand itself.