Closed dop closed 4 years ago
Oof. Does this only happen for the mersenne twister, or are other algorithms similarly affected?
Could not run middle-square, but the rest are affected. Details here https://gist.github.com/dop/83b359b8a22cfb374b62d74ddea6b366.
I managed to improve distribution (for MT32 at least) by rewriting random-int
method taking notes from https://www.pcg-random.org/posts/bounded-rands.html ("Bitmask with Rejection (Unbiased) — Apple's Method")
It basically loops until candidate random int fits in range.
(defmethod random-int ((generator generator) (from integer) (to integer))
(declare (optimize speed))
(let* ((range (- to from))
(bits (integer-length range)))
(declare (type (integer 0) range)
(type fixnum bits))
(+ from
(loop for candidate = (random-bytes generator bits)
when (<= candidate range)
return candidate))))
Are you interested in such changes as PR? If so, I could do a bit more testing for other methods.
oh definitely, by all means!
I'm sorry I don't have the time to investigate this myself at this time, but I'll be very happy to review PRs!
Hey @dop, I went ahead and submitted your code in a pull request. I hope that's okay. I think the simple method should be fine. Even in the 0..2 case, we expect to do only ~1.33 rolls.
Sure. I tried to come up with test case, but couldn't understand how to apply https://en.wikipedia.org/wiki/Chi-squared_test and then just put it aside...
I did a bunch of random sampling stuff for uni a little over half a year ago but seem to have forgotten all of it. There's probably a method that allows mapping the sample to the correct integer range without using rejection sampling as we do here, but I'm too dumb to do that, so this'll be fine for now.
I was snooping around the CPython source code for unrelated reasons, seems like they also use the resampling approach:
https://github.com/python/cpython/blob/0dd98c2d00a75efbec19c2ed942923981bc06683/Lib/random.py#L245
Just mentioning it for curiosity's sake. Cheers guys 🙂
I checked only
random-int
and it seems that it's biased.This script produces distributions of numbers for few small ranges: https://gist.github.com/dop/6198feaab4a3d85e2117ddcbb5d7961e
Results of
random-state:random-int
are non-uniform for 0..2, 0..4, and 0..5: