Please help me to understand PCG32/Bounded()

longborough commented 7 years ago

Thank you very much for this implementation of PCG. This isn't really an issue, but a question. I'd like to understand what appears to be excessive complexity in Bounded():

func (p *PCG32) Bounded(bound uint32) uint32 {
    if bound == 0 {
        return 0
    }
    threshold := -bound % bound
    for {
        r := p.Random()
        if r >= threshold {
            return r % bound
        }
    }
}

Since we're dealing with (uint32)s, what would be wrong with this, simpler, version?:

func (p *PCG32) Bounded(bound uint32) uint32 {
    if bound == 0 {
        return 0
    }
    r := p.Random()
    return r % bound
}

Thanks again

MichaelTJones commented 7 years ago

Yes, it is rather complicated. There are two reasons for this, a) the natural cost of a subtle issue which is necessary; and b) coding style which might be made better.

The subtle issue concerns the general case where the range of the generator's values (2^64) and the desired bound. Imagine a simpler case, a 3-bit range and a bound of 5 (8 not being a multiple of 5). Here is how it works:

rng rng%5 0 0 1 1 2 2 3 3 4 4 5 0 6 1 7 2

This means that the probabilities of 0..4 are...

0 2 in 8 1 2 in 8 2 2 in 8 3 1 in 8 4 1 in 8

...which is probably a surprise to many and would certainly be unwelcome to whatever algorithm and statistical analysis was expecting "random" numbers. This is true, but since the typical ranges of PRNGs are 2^32 or 2^64 and the typical bounds are hundreds, thousands, or millions, the ratio between what should be uniform and what is biased is much less erroneous in ratio even though it is still wrong and biased to smaller values.

With a 4-bit range the probabilities of 0..4 are...

0 4 in 16 1 3 in 16 2 3 in 16 3 3 in 16 4 3 in 16

With a 5-bit range it would result in

0 7 in 32 1 7 in 32 2 6 in 32 3 6 in 32 4 6 in 32

With a 6-bit range it would result in

0 13 in 32 1 13 in 32 2 13 in 32 3 13 in 32 4 12 in 32

The way to avoid this subtle problem is to ensure that the random value that is used in value%bound is in fact from a range that is a multiple of bound. Here is idea:

{ // determine max which is the greatest multiple of bound <= maxPossiblePRNGValue maxMultiple := bound*(maxPossiblePRNGValue/bound)

// get the next PRNG value v := NextPRNGValue()

// in the unlikely case that we've got one of the last few PRNG values in the range // maxMultiple... maxPossiblePRNGValue then just iterate again to find a new one for v >= maxMultiple { v = NextPRNGValue() }

// choose a uniformly weighted value in 0..bound-1 return v%bound }

Perhaps this is a subtle issue, and for many people the wrongness probably makes no difference. For a careful implementation, and a careful programmer though, this is critical. It is a point that Knuth makes clear in The Art of Computer Programming, which for me, makes this part of the conmplexity inarguable.

Now issue b, the question of how to implement this, centers on the clarity of the code on the one hand, and the desire to avoid the divide and the multiply in "bound*(maxPossiblePRNGValue/bound)" for the sake of performance. The code you found is the tricky but fast way, which goes from "small up to full range" rather than "zero up to reduced range." This part of it is mostly about personal taste and I'm biased towards speed as is the clever Melissa E. O'Neill, inventor of PCG.

Michael

On Tue, Sep 5, 2017 at 3:50 AM, Brent Longborough notifications@github.com wrote:

Thank you very much for this implementation of PCG. This isn't really an issue, but a question. I'd like to understand what appears to be excessive complexity in Bounded():

func (p *PCG32) Bounded(bound uint32) uint32 { if bound == 0 { return 0 } threshold := -bound % bound for { r := p.Random() if r >= threshold { return r % bound } } }

Since we're dealing with (uint32)s, what would be wrong with this, simpler, version?:

func (p *PCG32) Bounded(bound uint32) uint32 { if bound == 0 { return 0 } r := p.Random() return r % bound }

Thanks again

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/MichaelTJones/pcg/issues/1, or mute the thread https://github.com/notifications/unsubscribe-auth/AHgypTIN_-lRAOeIZ2hGNndsqBr-7xu3ks5sfSd3gaJpZM4PMxVh .

-- Michael T. Jones michael.jones@gmail.com

longborough commented 7 years ago

Michael, thank you very much for a clear explanation of a very subtle, but important, point. Although I "knew of the range x bound bias in the back of my mind", your amazing optimization led me down a different path, and managed to hide from me exactly what you were about. Congratulations, and best wishes, Brent

longborough commented 7 years ago

I've just rolled my own example with uint8 to see exactly how that works. In fifty years(! yes) of programming, I don't think I've seen more than about two or three pieces of code as wonderful as that.

MichaelTJones commented 7 years ago

Oh, not my invention. like most good ideas, it is something valuable passed from a former student to the next one. :-) Enjoy!

On Wed, Sep 6, 2017 at 4:13 AM, Brent Longborough notifications@github.com wrote:

I've just rolled my own example with uint8 to see exactly how that works. In fifty years(! yes) of programming, I don't think I've seen more than about two or three pieces of code as wonderful as that.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/MichaelTJones/pcg/issues/1#issuecomment-327452071, or mute the thread https://github.com/notifications/unsubscribe-auth/AHgypdX9ZRbO1q2zOiTBNSiP-EW-8dgGks5sfn5BgaJpZM4PMxVh .

-- Michael T. Jones michael.jones@gmail.com

MichaelTJones / pcg

Please help me to understand PCG32/Bounded() #1