golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
124.42k stars 17.71k forks source link

math/rand: documentation on rng.go is lacking important context and information #36133

Open Plazmaz opened 4 years ago

Plazmaz commented 4 years ago

Does this issue reproduce with the latest release?

Yes

What did you expect to see?

Details about the algorithm used, links to source material, context about why this algorithm was used and how it differs from other methods.

What did you see instead?

https://github.com/golang/go/blob/c2edcf4b1253fdebc13df8a25979904c3ef01c66/src/math/rand/rng.go#L7-L12 A single vague comment listing two names, without the title of sources used, the name of the algorithm used, or any details whatsoever. The names alone don't seem to be sufficient for finding the source material (at least based on a good amount of googling) It seems like I'm not the only person having this issue: https://www.seehuhn.de/blog/134.html

Could provide some more context on what algorithm is being used, why it was chosen, and how it differs from something like mersenne twister? I think it would be valuable knowledge, and a good addition to that documentation.

robpike commented 4 years ago

See https://github.com/golang/go/issues/21835. Perhaps this package should suggest using x/exp/rand.

zephyrtronium commented 4 years ago

For whoever might update the documentation: the default Source is a lagged Fibonacci generator with j=273, k=607.

To answer how it differs from Mersenne Twister, LFG is essentially a simple LFSR over elements in GF(2^64), which as #21835 mentions has a number of empirical and theoretical problems; MT is a linear recurrence over a polynomial in GF(2^19937) in its usual implementation with another linear transform applied to its outputs, giving it strong theoretical performance. (I can offer more details on the differences if desired later, but I have a final to take now.)

While I agree that math/rand probably ought to recommend a different generator for many applications, I find it odd for a standard package to recommend something in x/exp. How close is x/exp/rand to being promoted to x/math/rand or similar?

robpike commented 4 years ago

The proposal has not even been accepted yet. If it does get accepted, it will still be a while, as the process described in the proposal makes clear.

dmitshur commented 4 years ago

/cc @griesemer @rsc @josharian per owners.

Plazmaz commented 4 years ago

So I've done quite a bit of digging since originally opening this issue. Something I can't find an answer to is why the LCG used to seed the initial algorithm uses Schrage's method instead of 64-bit arithmetic. I'm wondering if that was an intentional decision or simply an artifact of the original systems this implementation was written for. It seems like using 32-bit logic has a few downsides:

  1. Seeds are truncated to 32 bits for use in the LCG (I am not positive that's why this was done, but it seems to be the case)
  2. It is computationally less efficient to use two single-width multiplications than one double-width multiplication, and 64-bit operations are used quite often elsewhere in this code.

If someone has more context or info on the decision behind that, that would be great. It's possible I would need to ask one of the original authors (Ken Thompson seems like the person who'd know, just looking at this comment https://groups.google.com/g/golang-nuts/c/RZ1G3_cxMcM/m/_7J7tnHhsU4J).

randall77 commented 4 years ago

Seeds are truncated to 32 bits for use in the LCG (I am not positive that's why this was done, but it seems to be the case)

They are not quite truncated, as they are moduloed (is that a word?) with an odd number first. But yes, half the bits of entropy are thrown away.

Just a guess, but seedrand does division/modulo on int32s, and maybe that was considered expensive if changed to int64 on 32-bit machines. Maybe that was a big deal ~10 years ago, but I don't think that's a big problem nowadays.

Plazmaz commented 4 years ago

I mean, it's being explicitly cast to an int32. The effect is either truncation of the higher bits or modulus, they're the same thing in this context. Regardless, the rest of this code is littered with 64 bit logic. To seed the generator, for each value in the state vector, three values from seedrand are used to generate an int64, which is then XOR'd with the 64-bit value in the state vector, so I think there'd still be plenty of problems on a 32-bit machine. Anyways, I am genuinely curious as to the intent behind this, and I have my own theories for sure, but if someone is actually aware of why this was done, that would be helpful. Otherwise I will see if I can get into contact with people involved in the original implementation and find out!