avrae / d20

A fast, powerful, and extensible dice engine for D&D, d20 systems, and any other system that needs dice!
MIT License
115 stars 26 forks source link

[Enhancement] Random.py Package Improvement #7

Open Evitcani opened 3 years ago

Evitcani commented 3 years ago

Background

A friend and I were looking into the avrae/d20 package. We noticed some concerning code flow with regards to how Random is being used. The package being used (random.py) is the old version of the MT algorithm which tends towards statistically lower outcomes (as compared to the 2002 or newer versions). It's also hit or miss given the lack of seeding which means we'd expect the randomness to basically end up being subsets of older rolls. (Deterministic for the purpose of statistical modeling.) Also, it's one of the slowest random algorithms.

There's lots of alternatives that are easy to use and provide more randomness and quicker throughput.

Suggested Improvement

Upgrade to a new random package to be used on lines 405 and 407 of expression.py. A package in the PCG family would likely be optimal for Avrae's usage.

Quote from the page on comparison to other algorithms:

PCG was developed by applying TestU01 to reduced-size variants,[7] and determining the minimum number of internal state bits required to pass BigCrush. BigCrush examines enough data to detect a period of 235, so even an ideal generator requires 36 bits of state to pass it. Some very poor generators can pass if given a large enough state;[8] passing despite a small state is a measure of an algorithm's quality, and shows how large a safety margin exists between that lower limit and the state size used in practical applications.

PCG-RXS-M-XS (with 32-bit output) passes BigCrush with 36 bits of state (the minimum possible), PCG-XSH-RR (pcg32() above) requires 39, and PCG-XSH-RS (pcg32_fast() above) requires 49 bits of state. For comparison, xorshift*, one of the best of the alternatives, requires 40 bits of state,[5]: 19  and Mersenne twister fails despite 19937 bits of state.[9]

This could avoid the known issues of many-zero state recovery in pre-2002 MT algorithms of which random.py uses. The best outcome for Avrae would be faster generation.

zhudotexe commented 3 years ago

I did some digging and it appears that the implementation of MT used by CPython's random is indeed MT2002 (as referenced at https://github.com/python/cpython/blob/main/Modules/_randommodule.c#L5). I'll look into this more later in the week.

posita commented 2 years ago

Apologies for butting in, but these seemed relevant:

Evitcani commented 2 years ago

Apologies for butting in, but these seemed relevant:

Good point on the PCG generators. I only pointed it out as it seemed the best suited (there are other, older generators that could be considered in its stead).

Evitcani commented 2 years ago

Just read your edit with this secondary link. It appears NumPy adopted a PCG algorithm in 2019 after a long discussion. The Python group seems hesitant due to its novelty (only being a 6-7 year old algorithm, if I’m reading correctly). They know the Twister has problems which there are solutions to.

A Xorshift generator might be better as its fail states are known and tested. It has some issues that make it difficult to achieve a long period without sufficient initialization. However it is notably faster than the Twister which would achieve the speedup considerations.

posita commented 2 years ago

This isn't my show, but one possibility might be to opportunistically defer to a package like numpy, meaning don't make it an install requirement, but use it, if present. (This is approach I'm currently considering for dyce.)

zhudotexe commented 2 years ago

I think that makes a lot of sense, and if we expose numpy as an extra dependency (i.e. installed via d20[pcg] or similar), transparent to the user (in addition to docs, of course).

Evitcani commented 2 years ago

That sounds like a great plan.

Could consider the Random.org API library as an option for people who want to pay for pretty near true RNG. Additional to the NumPy option. I wouldn’t mind making a fork for it, though it would require HTTP request async requirements with cache options. Gets a bit nasty and would slow performance.

zhudotexe commented 2 years ago

Haha! I think that's a little out of scope to bundle with d20 (especially since the package itself tries to make few assumptions about the environment it's running in re. sync/async), but if the plan is to make opportunistic usage of alternate RNG providers (e.g. numpy), it'll likely expose a randrange_impl function that can be easily monkey-patched to provide this functionality. I'll keep that in mind during implementation.

zhudotexe commented 2 years ago

I've done a little bit more reading into the articles/issues linked above and some articles linked to by them.

Reading list (in the order I read them):

Personal Thoughts

I can't speak for the quality of PCG vs. XorShift, and this topic seems to be of some debate in the mathematical community. What does seem to be agreed upon, at least, is that both of these are "better" than Mersenne Twister.

With regards to speed, the table I mentioned in the second linked article seems to display these results: PRNG Footprint (bits) BigCrush Failures HWD Failures ns/64b
xoroshiro128++ 128 -- -- 0.95
PCG64DXSM 128 -- -- 1.44
MT19937 20032 LinearComp -- 2.19

This table suggests a 1.52x speedup in PCG64 over MT, and 2.31x when using xoroshiro128++ (other variants excluded for brevity). However, in the context of d20, the actual time it takes to generate a random number is effectively negligible compared to the time it spends in other code (i.e. parsing/arithmetic evaluation/function call overhead/Python overhead). For example, the 2.19ns/64b presented in the above table is less than 7% of the 32.7ns it takes to generate a random number in Python (timed with $ python -m timeit -s "from random import random" "random()" on an i9 @ 2.3GHz), and less than 0.01% of the time d20.roll("1d20") takes (from the performance listed in the readme). When considered alongside Amdahl's Law, I believe that the speed gains from the choice of RNG is not a major factor here.

The most recent version of NumPy includes implementations of the Mersenne Twister, PCG64 (and the DXSM variant), Philox, and SFC64 (the latter two of which I have done no reading into, so I can't comment on them) - bringing this back into what this means for d20, I believe that if we were to implement some method for the user to supply which RNG to use (i.e. some interface for exposing a randrange implementation) and defaulted to some sane implementation (MT for a thin installation, PCG64 if numpy is installed), this leaves any library consumer in a good place.

Evitcani commented 2 years ago

Great and thoughtful research that was quite fascinating to read. I agree with the conclusions you’ve drawn for moving forward with d20. It’d provide a great interface for the user.

My suggestion would be the links provided here within the external or code documentation to let the user make a thoughtful conclusion at their own pace and learning.

posita commented 2 years ago

I took a stab at implementing a random.Random with numpy.random primitives underneath. (I'm actually kind of surprised someone else hasn't done this. If they have, I was unable to find it. There are probably improvements over my own approach. If you think of any and feel like sharing, please let me know.)

Code is here. Tests are here.

In my case, I merely import the resulting RNG and use it like any other random.Random implementation.

zhudotexe commented 2 years ago

That looks pretty clean! I think it might be worth implementing getrandbits() in your subclass as well since NumPy's BitGenerator certainly implements it, and the Random class takes advantage of it if supplied to allow for arbitrarily large randrange (and I think it may be slightly faster in discrete cases by inspecting the implementation of _randbelow_with_getrandbits vs _randbelow_without_getrandbits below).

Do you know if subclassing random.Random causes CPython to initialize the MT even if you don't use it? My intuition says that since you aren't calling super().__init__() it won't, but I'm not super familiar with the low-level workings of classes that eventually inherit from C extensions. (Also, random.Random.__init__() doesn't even call the super method from the C class it inherits from...)

Edit: It looks like the implementation of BitGenerator.random_raw differs slightly from _random.getrandbits, and I bet implementing Python logic to try and replicate this C logic would actually be slower than the implementation of _randbelow_without_getrandbits 😛.

Edit 2: Or maybe not! I bet you could replicate SystemRandom.getrandbits using Generator.bytes.

posita commented 2 years ago

Do you know if subclassing random.Random causes CPython to initialize the MT even if you don't use it? My intuition says that since you aren't calling super().__init__() it won't, but I'm not super familiar with the low-level workings of classes that eventually inherit from C extensions. (Also, random.Random.__init__() doesn't even call the super method from the C class it inherits from...)

I didn't take any time to look into initialization. 😞 I honestly hadn't thought about its implications. 😬 Not calling super.__init__() was probably an oversight on my part. Are you worried about performance on generator creation?

Edit: It looks like the implementation of BitGenerator.random_raw differs slightly from _random.getrandbits, and I bet implementing Python logic to try and replicate this C logic would actually be slower than the implementation of _randbelow_without_getrandbits 😛.

Edit 2: Or maybe not! I bet you could replicate SystemRandom.getrandbits using Generator.bytes.

Yeah, I think replacing _urandom(numbytes) in that implementation with numpy.random.Generator.bytes seems pretty straightforward:

    class NumpyRandom(Random):
        # …
        def getrandbits(self, k: int) -> int:
            # Adapted from random.SystemRandom. See
            # <https://github.com/python/cpython/blob/8c21941ddafdf4925170f9cea22e2382dd3b0184/Lib/random.py#L800-L806>.
            if k < 0:
                raise ValueError("number of bits must be non-negative")

            numbytes = (k + 7) // 8  # bits / 8 and rounded up
            x = int.from_bytes(self.randbytes(numbytes))

            return x >> (numbytes * 8 - k)  # trim excess bits

        def randbytes(self, n: int) -> bytes:
            return self._g.bytes(n)
zhudotexe commented 2 years ago

Are you worried about performance on generator creation?

Curiosity more than anything to be honest; since the MT is implemented in C it's pretty lightweight (relatively) anyway. Looking at the C code, it seems like it will allocate the space for the MT state (https://github.com/python/cpython/blob/main/Modules/_randommodule.c#L103) but never initialize it (which would normally be done here, which is only called by seed, which is overridden).

posita commented 2 years ago

Are you worried about performance on generator creation?

Curiosity more than anything to be honest; since the MT is implemented in C it's pretty lightweight (relatively) anyway. Looking at the C code, it seems like it will allocate the space for the MT state (https://github.com/python/cpython/blob/main/Modules/_randommodule.c#L103) but never initialize it (which would normally be done here, which is only called by seed, which is overridden).

Also, the parent random.Random constructor itself calls self.seed, so I'd have to rethink my own __init__ method. This or similar gymnastics just seem icky:

    class NumpyRandom(Random):
        # …
        def __init__(self, bit_generator: BitGenerator):
            self._g = Generator(bit_generator)
            super().__init__()
            self._g = Generator(bit_generator)
        # …

(FWIW, I'm not married to the implementation. I threw it together as a proof-of-concept over an hour or two without thinking too much about it.)

zhudotexe commented 2 years ago

Maybe something like this?

    class NumpyRandom(random.Random):
        _gen: Generator

        def __init__(self, bitgen_type: Type[_BitGenT], x: _SeedT = None):
            self._bitgen_type = bitgen_type
            super().__init__(x)

        # ...

        def seed(self, a: _SeedT = None, version: int = 2):
            # ...
            self._gen = Generator(self._bitgen_type(a))

    # ...

    if hasattr(numpy.random, "PCG64DXSM"):  # available in numpy 1.21 and up
        random_impl = NumpyRandom(numpy.random.PCG64DXSM)
posita commented 2 years ago

I may be coming around to your proposal. I started to explore something like this:

import random
from numpy.random import default_rng

# Same stuff that numpy.random.default_rng takes as its seed argument
_NumpySeed = Union[None, int, Sequence[int], BitGenerator, Generator]

class NumpyRandom(random.Random):
  _gen: Generator

  def __init__(self, numpy_seed: _NumpySeed):
    super().__init__(numpy_seed)

  def seed(
    self,
      a: _RandSeed,  # type: ignore
      version: int = 0,  # ignored
    ) -> None:
      self._gen = default_rng(a)
  # …

So basically, one could do:

from numpy.random import PCG64DXSM
anything_that_default_rng_would_take = PCG64DXSM(7)
rng = NumpyRandom(anything_that_default_rng_would_take)
# …

All good, right? Except:

# …
anything_that_default_rng_would_take = [1, 2]
default_rng(anything_that_default_rng_would_take)  # this works
NumpyRandom(PCG64DXSM([1, 2]))  # this works
NumpyRandom(default_rng(anything_that_default_rng_would_take))  # this works
NumpyRandom(anything_that_default_rng_would_take)  # WTF?

Results in:

Traceback (most recent call last):
  File "/…….py", line …, in <module>
    NumpyRandom(anything_that_default_rng_would_take)  # WTF?
TypeError: unhashable type: 'list'

🤦‍♂️

So there's obviously some magic fighting against Random living up to its own marketing as a derivable interface. To be fair, your proposal effectively side-steps this issue because the first argument to its __init__ is a class, which is always hashable. I hate that one has to engage in such shenanigans, though….

zhudotexe commented 2 years ago

Interesting! I think the implementation of NumpyRandom(anything_that_default_rng_would_take) creates a foot-gun, though - since this passes it directly to NumpyRandom.seed to initialize the backing generator, if a user with no understanding of this interaction calls rng.seed(123), it's possible that they could be changing the implementation of the BitGenerator from an explicitly defined one to whatever happens to be exposed by default_rng.

If you wanted to avoid passing the class as an argument, one could store the bitgen class as a classvar:

class NumpyRandom(random.Random, abc.ABC):
    _gen: Generator
    _bitgen_type: Type[BitGenerator]  # to be implemented by subclasses

    def __init__(self,  x: _SeedT = None):
        super().__init__(x)

    # ...

class PCG64DXSMRandom(NumpyRandom):
    _bitgen_type = numpy.random.PCG64DXSM

# ...

But I'm uncertain that doing that really adds much of value; I think it just boils down to code style preferences at that point.

posita commented 2 years ago

You're right, of course. There's another foot gun, in that NumPy "seeds" can be stateful:

>>> from numpy.random import default_rng, PCG64DXSM
>>> seed = PCG64DXSM()
>>> generator = default_rng(seed)
>>> state = seed.state
>>> generator.random()
0.8390240576385848
>>> seed.state == state
False
>>> seed.state = state
>>> generator.random()
0.8390240576385848

So if you keep a reference to the BitGenerator outside of the NumPyRandom object, you can manipulate how the latter behaves (and use of the latter will change the state of the thing referenced). That and the case you point out makes for some sharp edges that don't justify their presence.

I like the subclass thing. I took a stab (and solved the non-hashable seed problem) here.