bryc / code

Other
300 stars 24 forks source link

Mulberry32 no longer recommended by its author #17

Closed shamrin closed 8 months ago

shamrin commented 1 year ago

@tommyettinger wrote in November 2022:

Revisiting this 5 years later, this is not great. Mulberry32 isn't equidistributed, so it doesn't produce all possible 32-bit values, and actually can't produce about 1/3 of all possible uint32_t numbers.

https://gist.github.com/tommyettinger/46a874533244883189143505d203312c?permalink_comment_id=4365431#gistcomment-4365431

tommyettinger commented 1 year ago

Yep! There are probably some reasons to choose Mulberry32. It does enough wrangling over its brief period to pass PractRand, so it at least appears to statistical tests like the 2/3 of all possible values it can produce are distributed fairly. That is, for the 16 GB it can output without repeating. I would personally much rather be able to output all uint32_t than only some, and the approach I surmised at the linked Gist will do that. It probably won't pass 16GB of PractRand, but I feel like Mulberry32 only passed through a combination of lucky constants and "tricking" the tests into thinking it was a type of generator with a longer period. The old standard rule of thumb was that a typical RNG was only good for as many outputs as the square root of its period length, which is just 65536 outputs for Mulberry32 and the unnamed one in the Gist. But, 64-bit analogues of the Gist algorithm easily pass much more than sqrt(pow(2, 64)) == pow(2, 32) outputs, and there's no mention in Java's closely related SplittableRandom about it having problems after 32GB of output (I certainly haven't found any). The old standard really is quite old, and mostly applies to generators like simple LFSRs or LCGs, where I think it remains true. It's possible that your use case will be fine with Mulberry32; it's also possible that an even simpler generator may suffice, or that you might at some point want something with a larger state size and period size.

For state, I really like using an additive counter by an odd increment because it gives constant-time skipping over any distance possible, and guarantees that all states can be reached. The problem with Mulberry32 is after that; the output section works like exponentiation and isn't reversible, so many states produce the same output as at least one other state.

A pattern I've liked more recently is to take a generator like Mulberry32 that can't (as it is currently written) produce all possible values, and make that generator one of many, many possible streams in what's effectively a family of generators. There are ways to guarantee that either

The problem there is that it requires at least 32 bits more state, and Mulberry32 is meant to use very little.

If you want me to ramble some more about this, feel free to @ me again! This is definitely my favorite field of CS.

shamrin commented 1 year ago

@tommyettinger Thank you for your response, it was great! I've created this ticket from the following perspective:

  1. I needed a simple, seedable and repeatable PRNG in my JavaScript project. I could not use Math.random(), because it fails last two properties.
  2. I've found nice PRNGs.md guide in the repository, which has mulberry32 as one of its recommendations.
  3. (Free) ChatGPT 3.5 also pointed me to mulberry32. (Because of its 2021 time cut-off, ChatGPT does not know about your November 2022 comment in the gist.)

For the task at hand I've decided to use LCG, either by adding d3-random to my project or by copy-and-pasting the implementation directly. I'm aware LCG has some downsides, but they are not relevant to my use case.

However, it would have been cool if there was another equally simple algorithm that does not have LCG downsides, and has good quality research behind it, and well-written and well-tested JavaScript implementation.

@tommyettinger, from your experience, what could be that algorithm?

tommyettinger commented 1 year ago

I'm not really a JS dev mainly, and I usually prefer developing random number code with 64-bit types because they're the norm everywhere but embedded systems, JS, and some GPU applications. Using 32-bit math is a good challenge, though!

I'm a big fan of the SplitMix family, which is well-studied in its 64-bit form, https://gee.cs.oswego.edu/dl/papers/oopsla14.pdf , because Java introduced it rather suddenly in Java 8 as SplittableRandom's biggest part. Searching the literature will probably find its successor, LXM, which has a much larger state size and is probably useless for your case (it's also quite slow even in Java 17, where it was introduced). You can make SplitMix more robust if necessary by choosing a stronger unary hash for its output. It's hard to go wrong with any unary hash found as part of the HashProspector search run by skeeto, https://github.com/skeeto/hash-prospector . The one in the Gist earlier is closely related to the original SplitMix32 (I'm not even sure if the original paper gave a 32-bit version, but SplitMix32-titled generators popped up all over), though it uses one of the stronger unary hashes discovered in the last few years using better searches for multipliers. The choice of the large gibberish constant increment added to the state is something that gets argued about a lot for the wrong reasons. Long story short, if you have an irrational-like sequence of bits in an odd-number increment, you're fine. If you don't want to select one, 0x9E3779B9 is among the best and most widely-used 32-bit increments (it's related to the golden ratio, and a 64-bit version is called GOLDEN_GAMMA sometimes).

The only real known issues with SplitMix generators, other than the questionable issue "produces each possible output exactly once over its period," apply to a particular variant used by Java 8 (SpittableRandom) and have to do with the increment added to the state every generation, called the gamma. Certain gamma values, like 0xAAAAAAAB for SplitMix32, are really bad for... really, almost any purpose, but they really harm SplitMix if used there. SplittableRandom can choose a gamma automatically, and no one has a particularly good way of checking if the gamma is good or bad quickly. This issue can be circumvented entirely by always using any good-enough gamma, like (uint32_t)(0x100000000ULL * SOME_IRRATIONAL_CONSTANT) | 1U, where SOME_IRRATIONAL_CONSTANT could be pi, e, the square root of a prime, etc. It just has to be an odd number with random-enough bits in the more-significant places, and the SplitMix algorithm works fine for really most very large odd numbers as the gamma.

SplitMix allows skipping in O(1) time and is trivial to generate in backwards order, which can be useful (just run the same unary hash on the state's value, then subtract the increment rather than add it from the old state).

As for other options...

LCGs are fine if you only care about the most significant bits, although Marsaglia's Theorem may be a problem for some multipliers -- that is a known issue with MCGs and I think also LCGs where if you graph the outputs in some dimension D of hypercube, like by generating x, y, z, x, y, z, x, y, z in 3D... all outputs will fall on parallel dimension-(D-1) hyperplanes, with large gaps between each plane where no outputs are produced. LCGs allow skipping, but not in O(1) time. They can be run in reverse order by using the multiplicative inverse of the LCG multiplier mod 2 to the 32, and there's probably an online calculator for that.

Galois and Fibonacci LFSRs are probably not fantastic choices on their own. Other LFSR types are newer, like various Xorshift generators, and generally better. These don't allow skipping by arbitrary amounts at all, and can be run in reverse in what could be an annoyingly complex way depending on the generator.

You can also mix these in all sorts of ways; PCG-Random is based on an LCG for its state change (instead of just an increment), but then runs a unary hash on that like SplitMix does (it isn't quite as strong of a unary hash, but doesn't need to be). PCG-Random usually operates only on the upper half of the LCG bits, so it would need a 64-bit multiplication and I am guessing that's off-limits.

I think that if you are mostly using the most-significant bits and don't need skipping by arbitrary distances, an LCG will be fine. If you use the least-significant bits, those are not random in LCGs (they alternate even and odd results, for instance), but are random in the SplitMix family, and could be made random from an LCG by doing something like what PCG-Random does. They could probably be made random enough by doing something like this, if you encounter problems:

state = state * mul + inc; // LCG state change.
return state ^ state >>> 13 ^ state >>> 21; // Mixes more-random upper bits into less-random lower bits.

The shift amounts are guesses at what might work well; other than that you should probably avoid multiples of 8, I don't have many ideas for what would work any better or worse.

OK, I have to write a bunch of code in the next 3 hours. Best of luck!

bryc commented 1 year ago

Updated the page to reflect the concerns, thanks! :)

I've personally found LCGs to be a a poor match for JavaScript, because you almost always are going to need a 64-bit multiplication result to perform a modulo on, so you can't really take advantage of Math.imul which is faster in JS. And it's easy to screw up if the parameters aren't well-chosen. It doesn't help that the most common one is the Park-Miller MCG which forgoes the counter, and has quite bad statistics. Maybe a decent optimized LCG can be constructed, but I haven't bothered to look.

IMO, jsf32 and sfc32 are probably the best options for JavaScript, as they optimize well in JS engines and pass randomness tests well. And the state size or number of operations is somewhat irrelevant since the JS interpreter is going to be the bottleneck either way. splitmix32 is great though, you just don't necessarily get much performance or memory savings in JS from it.

tommyettinger commented 1 year ago

Oh yeah, if you have the state size for SFC, it's a great design! I haven't studied JSF32 very thoroughly, but O'Neill has, and she didn't find any issues with it. A nice trait of SFC is that with its simple counter, you can guarantee leaps of at least a minimum distance just by adding that desired minimum to the counter (as long as it fits in a uint32_t). SplitMix can jump an exact, arbitrary distance forwards or backwards, and JSF I don't think can jump or leap at all. Personally I think chaotic RNGs with a counter are an excellent middle ground between features and speed; some of the fastest RNGs I've made that pass tests are chaotic, and about half have counters. That is, fastest on a desktop JVM, where most of my focus is; I'm certain JS will be different a lot of the time.