Shinmera / random-state

A collection of portable random generators for Common Lisp
http://shinmera.github.io/random-state/
zlib License
27 stars 9 forks source link

Non-uniform distribution of random-int #5

Closed dop closed 4 years ago

dop commented 4 years ago

I checked only random-int and it seems that it's biased.

$ sbcl --version
SBCL 2.0.2

This script produces distributions of numbers for few small ranges: https://gist.github.com/dop/6198feaab4a3d85e2117ddcbb5d7961e

Results of random-state:random-int are non-uniform for 0..2, 0..4, and 0..5:

Distribution for range 0..1
  0 0.499041
  1 0.500959
Distribution for range 0..2
  0 0.498538
  1 0.249887
  2 0.251575
Distribution for range 0..3
  0 0.249189
  1 0.249743
  2 0.249687
  3 0.251381
Distribution for range 0..4
  0 0.249224
  1 0.124319
  2 0.377002
  3 0.12452
  4 0.124935
Distribution for range 0..5
  0 0.124396
  1 0.249832
  2 0.251413
  3 0.124688
  4 0.249671
  5 0.0
Shinmera commented 4 years ago

Oof. Does this only happen for the mersenne twister, or are other algorithms similarly affected?

dop commented 4 years ago

Could not run middle-square, but the rest are affected. Details here https://gist.github.com/dop/83b359b8a22cfb374b62d74ddea6b366.

dop commented 4 years ago

I managed to improve distribution (for MT32 at least) by rewriting random-int method taking notes from https://www.pcg-random.org/posts/bounded-rands.html ("Bitmask with Rejection (Unbiased) — Apple's Method")

It basically loops until candidate random int fits in range.

(defmethod random-int ((generator generator) (from integer) (to integer))
  (declare (optimize speed))
  (let* ((range (- to from))
         (bits (integer-length range)))
    (declare (type (integer 0) range)
             (type fixnum bits))
    (+ from
       (loop for candidate = (random-bytes generator bits)
             when (<= candidate range)
             return candidate))))

Are you interested in such changes as PR? If so, I could do a bit more testing for other methods.

Shinmera commented 4 years ago

oh definitely, by all means!

I'm sorry I don't have the time to investigate this myself at this time, but I'll be very happy to review PRs!

Kevinpgalligan commented 4 years ago

Hey @dop, I went ahead and submitted your code in a pull request. I hope that's okay. I think the simple method should be fine. Even in the 0..2 case, we expect to do only ~1.33 rolls.

dop commented 4 years ago

Sure. I tried to come up with test case, but couldn't understand how to apply https://en.wikipedia.org/wiki/Chi-squared_test and then just put it aside...

Shinmera commented 4 years ago

I did a bunch of random sampling stuff for uni a little over half a year ago but seem to have forgotten all of it. There's probably a method that allows mapping the sample to the correct integer range without using rejection sampling as we do here, but I'm too dumb to do that, so this'll be fine for now.

Kevinpgalligan commented 4 years ago

I was snooping around the CPython source code for unrelated reasons, seems like they also use the resampling approach:

https://github.com/python/cpython/blob/0dd98c2d00a75efbec19c2ed942923981bc06683/Lib/random.py#L245

Just mentioning it for curiosity's sake. Cheers guys 🙂