axmolengine / axmol

Axmol Engine – A Multi-platform Engine for Desktop, XBOX (UWP) and Mobile games. (A fork of Cocos2d-x-4.0)
https://axmol.dev
MIT License
868 stars 195 forks source link

FastRNG seems to have problematic implementation #2039

Closed smilediver closed 2 months ago

smilediver commented 2 months ago

FastRNG seems to have problematic implementation. For example, take the range() function:

https://github.com/axmolengine/axmol/blob/602a2cab8fd45144f85b329a6f54429e3aa98a2e/core/math/FastRNG.h#L71-L75

It's unclear what the range is supposed to be. Is it [min, max] or [min, max)? I would expect it to be [min, max), but it seems it's [min, max]. If you call range(0, 1), then you get rng() / RNG_RAND_MAX, so almost all random values will be 0, but if you're (un)lucky and it generates RNG_RAND_MAX, then it will be 1. If I understand correctly, even if interval is supposed to be [min, max], then the max value has a very low chance to be generated. Also, if the interval is [INT_MIN, INT_MAX], then it lacks the precision.

I think this class needs to be redesigned. The interval should be specified as [min, max), so you could safely do stuff like array[FastRNG::range(0, array.size)] or array[FastRNG::max(array.size)] . And the math should be carefully rewritten to use doubles. But this needs to be discussed to understand what the original intervals were supposed to be, and if there would be potential breaking changes.

Edit: also the rng() implementation itself is very sketchy, I can't find any docs or other implementations about this particular generator.

rh101 commented 2 months ago

Is it [min, max] or [min, max)?

Just out of curiosity, what notation is that?

smilediver commented 2 months ago

It's a standard mathematical notation where [] means inclusive bounds and () means exclusive. So for example [min, max) means min <= x < max. https://en.wikipedia.org/wiki/Interval_(mathematics)#Including_or_excluding_endpoints

halx99 commented 2 months ago

FastRNG seems to have problematic implementation. For example, take the range() function:

https://github.com/axmolengine/axmol/blob/602a2cab8fd45144f85b329a6f54429e3aa98a2e/core/math/FastRNG.h#L71-L75

It's unclear what the range is supposed to be. Is it [min, max] or [min, max)? I would expect it to be [min, max), but it seems it's [min, max]. If you call range(0, 1), then you get rng() / RNG_RAND_MAX, so almost all random values will be 0, but if you're (un)lucky and it generates RNG_RAND_MAX, then it will be 1. If I understand correctly, even if interval is supposed to be [min, max], then the max value has a very low chance to be generated. Also, if the interval is [INT_MIN, INT_MAX], then it lacks the precision.

I think this class needs to be redesigned. The interval should be specified as [min, max), so you could safely do stuff like array[FastRNG::range(0, array.size)] or array[FastRNG::max(array.size)] . And the math should be carefully rewritten to use doubles. But this needs to be discussed to understand what the original intervals were supposed to be, and if there would be potential breaking changes.

Edit: also the rng() implementation itself is very sketchy, I can't find any docs or other implementations about this particular generator.

mayby @DelinWorks can anwser

DelinWorks commented 2 months ago

@smilediver I just ported over code from kiss rng algorithm, I didn't actually implement integer random generation correctly because I was focusing on floats for the particle system.

the nature of how it's generated makes it have no bias in floating numbers, ints will I guess you do have bias, a good old round function will solve, it is mainly used for axmol's particle system

what is your use case for this? why not use the mt19937 random engine? I think it's wrapped in RandomHelper in axmol

what do you mean by rng() function is sketchy? this is just a procedural random number generator designed to be fast not to be relied on for cryptographic things. look here https://en.wikipedia.org/wiki/KISS_(algorithm) and here https://github.com/cgwrench/librandom/blob/master/src/kiss.c

 int32_t range(int32_t min, int32_t max) 
 { 
     return static_cast<int32_t>(min + rng() / (RNG_RAND_MAX / (max - min))); 
 } 

I guess here you don't have to cast because it will just strip the decimal points effectively flooring it so you get the min value 99% chance of the time, std::round should've been used, as for the exclusive or inclusiveness RNG_RAND_MAX - 1 should suffice, I'll see what I can do

smilediver commented 2 months ago

I don't have a use case, I was simply fixing warnings and noticed that implementation (at least the int part) has some serious flaws. It's a public API, so people will use it and I'm just worried about the results.

About the rng(), I know the basic idea behind KISS, but I couldn't find any information about this particular implementation and what its properties are. That's why I called it sketchy. I mean, it may very well be a good function, but couldn't find any info to back it up.

I've been looking into good random functions. xoshiro256** seems like a good choice, and .Net and Lua uses it. But I'm not convinced we need 256 bit state, so maybe xoroshiro128** is a better alternative. Here's the creator's page with info about the algorithms: https://prng.di.unimi.it/.

DelinWorks commented 2 months ago

I can't think of a point to why replace kiss rng with xoshiro256** or xoroshiro128**, I personally think the extra bits are not important for any random number generation use case, 32 bits is more than enough, and still if you try to generate a random number from 0.0 to 1.0 you'll still get high precision even with doubles when using 32 bit PRNGs (1 divided by 4294967295 to be exact), point is that extra bits waste cpu cycles and memory space which for a particle system it's gonna be wild

if you need to generate a uint64_t you can still use a 32 bit PRNG and generate two uint32_t and use bit shifting to get a random uint64_t, it's more efficient this way.

uint64_t high = static_cast<uint64_t>(rand()) << 32;
uint64_t low = static_cast<uint64_t>(rand());
return high | low;

then you can add your own min and max by adding min and dividing by max - min

uint64_t random;
random = min + random / (UINT64_MAX / (max - min)));

same for uint128_t but more steps

smilediver commented 2 months ago

My initial reasons for suggesting changing the generator were mainly these:

So maybe the last argument could be ignored if the purpose of FastRNG is to be just fast and good enough, then we can drop the idea of using 64 bit generator and stay with 32. Then xoshiro128** could be good enough, which is also used in .Net on 32 bit platforms.

But it probably makes sense to do a benchmark for several implementations to make sure that it's really the fastest one.

DelinWorks commented 2 months ago

I originally meant that 128 and 256 bit PRNGs are overkill. sorry I didn't clarify that 32 along with 64 PRNGs are fine too, there is a helpful kiss 64 bit implementation that I'll copy, also there is a problem that I couldn't solve, when rng is called it returns uint32_t that ranges from 0 to uint32_max, this rng function has a very extremely tiny chance of returning 0, so when you directly map its value to the min and max of the function the min will never be given. it's like (min, max) which I couldn't solve. Just thought if you have any idea you could help me with this issue. also I remembered thats why I originally thought of using static_cast instead of round so I can guarantee min is returned. I'll see what I can do

DelinWorks commented 2 months ago

also the xoshiro256 does have a 64bit version I'll use to see if it solves the 0 uint32_max bias issue

smilediver commented 2 months ago

I think there was some miscommunication between us regarding bits. Both xoshiro256** and xoroshiro128** are 64 bit number generators. 256 and 128 in their names represent the state size, not the bits generated. FastRNG has 224 bits of state (_seed should be removed).

Regarding the biases, xoshiro256**, xoroshiro128** and xoshiro128** have periods of 2 ^ state_bits - 1 and generate all numbers n times and zero n - 1 times. But since n is huge it's practically irrelevant, so there should be no problems generating all numbers with equal probability.

DelinWorks commented 2 months ago

oh now I understand thanks for clarification, perhaps you can update the code yourself since you seem to understand the concept more. if not it's okay I will

smilediver commented 2 months ago

Ok, I'll have a look at it.

DelinWorks commented 2 months ago

@smilediver look at this

struct FastRNG // xoroshiro64**
{
    const uint32_t RNG_RAND_MAX = std::numeric_limits<uint32_t>::max();
    uint32_t s[2];

    // uses SplitMax64 that xoshiro docs recommend
    uint64_t get_seed(uint64_t x) {
        uint64_t z = (x += 0x9e3779b97f4a7c15);
        z = (z ^ (z >> 30)) * 0xbf58476d1ce4e5b9;
        z = (z ^ (z >> 27)) * 0x94d049bb133111eb;
        return z ^ (z >> 31);
    }

    FastRNG() {
        seed_rng(static_cast<uint64_t>(time(NULL)));
    }

    // initialize this object with a uint64_t seed
    void seed_rng(uint64_t seed)
    {
        uint64_t sd = get_seed(seed);
        s[0] = sd;
        s[1] = sd >> 32;
    }

    static inline uint32_t rotl(const uint32_t x, int k) {
        return (x << k) | (x >> (32 - k));
    }

    // returns a random uint32_t value
    uint32_t rng()
    {
        const uint32_t s0 = s[0];
        uint32_t s1 = s[1];
        const uint32_t result = rotl(s0 * 0x9E3779BB, 5) * 5;

        s1 ^= s0;
        s[0] = rotl(s0, 26) ^ s1 ^ (s1 << 9); // a, b
        s[1] = rotl(s1, 13); // c

        return result;
    }

    // returns a random double from min inclusive to max inclusive [MIN, MAX]
    double range(double min = -1.0F, double max = 1.0F)
    {
        double normal = static_cast<double>(rng()) / RNG_RAND_MAX;
        return min + normal * (max - min);
    }

    // returns a random signed integer from min inclusive to max exclusive [MIN, MAX)
    int32_t rangei(int32_t min, int32_t max)
    {
        double normal = rng() / (double(RNG_RAND_MAX) + 1);
        return static_cast<int32_t>(normal * (max - min) / 1.0 + min);
    }

    // returns a random unsigned integer from min inclusive to max exclusive [MIN, MAX)
    uint32_t rangeu(uint32_t min, uint32_t max)
    {
        double normal = rng() / (double(RNG_RAND_MAX) + 1);
        return static_cast<uint32_t>(normal * (max - min) / 1.0 + min);
    }

    double double01() { return range(0, 1); }
    float float01() { return range(0, 1); }
};

I have used xoroshiro64**, I know there is xoshiro128++ but the first uses a much simpler method with 64 bit state, I could have used 64 bit generators but exclusiveness for integers wouldn't work since you have to do (double(UINT64_MAX) + 1) which won't fit in a double and won't guarantee exclusiveness again, that's why I reside with 32 bit generator, or maybe we can subtract 1 from max to guarantee exclusiveness instead of adding 1 to UINT32_MAX, this is just theoretical any ideas would be helpful!

NOTE: I'm not sure if this is the correct way to apply SM64 to the state of xoroshiro64** but I guess this works

DelinWorks commented 2 months ago

xoroshiro64** is faster nice!

benchmark xoroshiro64 took:   106 millis, 106456 micros
benchmark kiss took:          127 millis, 127905 micros
smilediver commented 2 months ago

I'm also currently writing some benchmarking code and actually it looks like all of these are faster.

smilediver commented 2 months ago

Did some benchmarking and these are the results. Tests had 3 different implementations for FastRNG like random functions: FastRNGOriginal, which is the original unmodified FastRNG, d functions based on double and i based on ints and floats. Then d and i implementations were used with different RNGs.

Results have two tables with the same data represented in percentages. In first table data is relative to FastRNGOriginal (first row is 100, the rest is relative to first row). Second table has everything relative to FastRNGOriginal::rng() function (everything is relative to first row first column value).

Rng column represents raw rng() call, other columns represents functions like RanI is range(), MaxU is maxu(), etc.

Code is here if you want to check it out: https://gist.github.com/smilediver/b27343bde1aecbc9e957f1c41df067d4

Galaxy A6, 32 bit Android device

Results:
  Name                  Bits   Rng  RanI  RanU  MaxI  MaxU  RanF  MaxF   F01  Bool
  FastRNGOriginal         32   100   100   100   100   100   100   100   100   100
  d FastRNG               32    91    46    45    46    42   123   125   119    91
  i FastRNG               32    93    28    28    29    27    85    79    79    92
  d Xoshiro256**          64   149    51    51    46    42   126   126   127   121
  i Xoshiro256**          64   149    38    38    38    35   115   120   145   121
  d Xoroshiro128**        64   121    43    43    44    41   120   119   118   109
  i Xoroshiro128**        64   121    29    29    26    24   111   112   127   109
  d Xoshiro128**          32    74    39    39    39    36   104   104   103    75
  i Xoshiro128**          32    74    25    24    24    22    66    63    63    75
  d Xoroshiro64**         32    65    38    38    37    34   102    99    99    63
  i Xoroshiro64**         32    65    24    24    25    23    69    65    59    64
Results:
  Name                  Bits   Rng  RanI  RanU  MaxI  MaxU  RanF  MaxF   F01  Bool
  FastRNGOriginal         32   100   534   535   513   544   214   196   171   102
  d FastRNG               32    91   246   246   236   233   264   246   205    93
  i FastRNG               32    93   152   152   152   149   183   155   137    94
  d Xoshiro256**          64   149   277   277   236   233   270   249   218   124
  i Xoshiro256**          64   149   205   205   196   193   249   236   249   124
  d Xoroshiro128**        64   121   233   233   230   227   258   233   202   112
  i Xoroshiro128**        64   121   155   155   137   135   239   221   219   112
  d Xoshiro128**          32    74   211   211   202   199   224   205   177    77
  i Xoshiro128**          32    74   134   133   124   121   143   124   109    77
  d Xoroshiro64**         32    65   205   205   193   189   221   196   171    65
  i Xoroshiro64**         32    65   130   130   130   127   149   127   102    66

Galaxy S10e, 64 bit Android device

Results:
  Name                  Bits   Rng  RanI  RanU  MaxI  MaxU  RanF  MaxF   F01  Bool
  FastRNGOriginal         32   100   100   100   100   100   100   100   100   100
  d FastRNG               32    98    58    51    97    39    86    78    85    86
  i FastRNG               32    90    50    49   105    42   111    93   105    89
  d Xoshiro256**          64    75    43    44    83    32    61    61    67    71
  i Xoshiro256**          64    77    41    42    85    34    66    62    67    72
  d Xoroshiro128**        64    84    48    47    92    37    81    75    82    82
  i Xoroshiro128**        64    88    48    47    87    36    78    73    78    80
  d Xoshiro128**          32    75    43    42    78    31    64    60    66    71
  i Xoshiro128**          32    72    39    38    73    29    63    60    65    71
  d Xoroshiro64**         32    84    47    46    84    34    79    70    84    82
  i Xoroshiro64**         32    88    47    46    89    36    76    72    79    81
Results:
  Name                  Bits   Rng  RanI  RanU  MaxI  MaxU  RanF  MaxF   F01  Bool
  FastRNGOriginal         32   100   186   191    98   241   109   118   108   103
  d FastRNG               32    98   109    98    95    95    94    92    92    89
  i FastRNG               32    90    93    93   104   102   122   110   114    93
  d Xoshiro256**          64    75    80    85    81    77    67    73    73    74
  i Xoshiro256**          64    77    78    80    84    82    72    74    73    74
  d Xoroshiro128**        64    84    91    91    90    89    89    89    89    85
  i Xoroshiro128**        64    88    90    90    86    87    85    87    85    83
  d Xoshiro128**          32    75    81    80    77    76    70    72    72    74
  i Xoshiro128**          32    72    73    72    72    71    69    71    71    73
  d Xoroshiro64**         32    84    88    88    83    84    87    84    91    85
  i Xoroshiro64**         32    88    89    88    88    87    83    86    86    84

MacBook Pro M1

Results:
  Name                  Bits   Rng  RanI  RanU  MaxI  MaxU  RanF  MaxF   F01  Bool
  FastRNGOriginal         32   100   100   100   100   100   100   100   100   100
  d FastRNG               32   100    63    62    64    64    99    99    99    99
  i FastRNG               32    99    66    64    66    66    99    99    99    98
  d Xoshiro256**          64    87    57    56    55    54    87    87    83    82
  i Xoshiro256**          64    87    56    55    58    57    86    85    84    82
  d Xoroshiro128**        64   104    61    61    67    65    96   101   101    91
  i Xoroshiro128**        64   104    68    66    70    69    95   100   102    90
  d Xoshiro128**          32    93    57    56    55    54    87    88    84    82
  i Xoshiro128**          32    93    56    55    60    59    85    85    83    82
  d Xoroshiro64**         32    96    61    60    63    62    96    96    98    87
  i Xoroshiro64**         32    96    64    62    62    62    94    98    95    87
Results:
  Name                  Bits   Rng  RanI  RanU  MaxI  MaxU  RanF  MaxF   F01  Bool
  FastRNGOriginal         32   100   160   164   156   159   100   100    99   101
  d FastRNG               32   100   102   103   101   102   100    99    99   100
  i FastRNG               32    99   106   106   104   105   100    99    99    99
  d Xoshiro256**          64    87    91    93    87    87    87    88    83    83
  i Xoshiro256**          64    87    91    91    91    91    86    85    83    83
  d Xoroshiro128**        64   104    99   101   105   104    97   101   101    92
  i Xoroshiro128**        64   104   109   110   110   110    96   101   102    91
  d Xoshiro128**          32    93    92    93    87    87    87    88    84    83
  i Xoshiro128**          32    93    89    91    94    94    86    85    83    83
  d Xoroshiro64**         32    96    99    99    99   100    96    96    98    88
  i Xoroshiro64**         32    96   103   103    98    99    94    99    95    88

Looks like Xoshiro128** is the best choice for both 32 and 64 bit Arm CPUs.

DelinWorks commented 2 months ago
AMD Ryzen 7 5800x, 64bit

Results:
  Name                  Bits   Rng  RanI  RanU  MaxI  MaxU  RanF  MaxF   F01  Bool
  FastRNGOriginal         32   100   100   100   100   100   100   100   100   100
  d FastRNG               32   109    97    90    92    88    96    95    95   101
  i FastRNG               32   110    93    90    93    86    99    99   100   101
  d Xoshiro256**          64    83    69    67    72    65    77    76    76    76
  i Xoshiro256**          64    84    70    69    72    66    77    75    76    76
  d Xoroshiro128**        64    84    71    68    72    65    70    72    69    79
  i Xoroshiro128**        64    83    91    81    87    79    75    69    72    79
  d Xoshiro128**          32    86    73    72    73    67    66    77    77    80
  i Xoshiro128**          32    86    71    69    74    67    76    77    77    80
  d Xoroshiro64**         32    84    72    71    73    66    79    76    77    78
  i Xoroshiro64**         32    83    83    82    88    81    76    77    76    79

Results:
  Name                  Bits   Rng  RanI  RanU  MaxI  MaxU  RanF  MaxF   F01  Bool
  FastRNGOriginal         32   100   118   121   114   125   107   107   106   104
  d FastRNG               32   109   114   109   105   110   104   101   101   105
  i FastRNG               32   110   110   110   106   108   107   107   107   105
  d Xoshiro256**          64    83    82    82    82    82    83    81    81    80
  i Xoshiro256**          64    84    83    84    82    83    83    81    81    79
  d Xoroshiro128**        64    84    84    83    82    82    75    77    74    82
  i Xoroshiro128**        64    83   108    99    99    99    81    74    77    83
  d Xoshiro128**          32    86    86    88    84    85    71    83    82    83
  i Xoshiro128**          32    86    84    84    84    84    81    82    82    84
  d Xoroshiro64**         32    84    85    87    83    83    85    82    82    81
  i Xoroshiro64**         32    83    98    99   101   102    82    83    81    82

Xoshiro128** is indeed the best if I have to pick for both x86 and arm

DelinWorks commented 2 months ago

since we are gonna reside on using a 32bit generator we should use

double normal = rng() / (static_cast<double>(RNG_RAND_MAX) + 1);
return static_cast<uint32_t>(normal * (max - min) / 1.0 + min);

instead of

return min + static_cast<int32_t>(nextMax(max - min));

because I noticed the nextMax algorithm you had seems to miss up the state order which can be bad for stuff that rely on seeding like generating procedural worlds.

nextMax: 369914868
normal: 369757792

these are two different numbers caused by additional calls to rng()

smilediver commented 2 months ago

You don't really care how many rng() calls are made internally as long as the results of RNG with the same seed are repeatable. So I don't see the problem here, unless I'm not understanding something.

DelinWorks commented 2 months ago

what does it bring to the table through? does it just only guarantee exclusiveness? I think the code that I suggested does the job with less complixcity. if there is a reason we should it keep it then that's ok

smilediver commented 2 months ago

Its purpose is to make the resulting distribution uniform, so if you're casting a dice with max(6), then you get equal probabilities for each dice's face.

DelinWorks commented 2 months ago

that's very useful! do you think it should have its own function? like range_uniform?

smilediver commented 2 months ago

Probably not. I think in almost all cases you want a uniform distribution, and if not, then you're probably doing something advanced and will have your own rand function anyway.

DelinWorks commented 2 months ago

can you explain this function?

return static_cast<float>(rng() >> 8) * 0x1.0p-24f;

why not use normalization? and float01 returns 0.0 to 2.0 instead of 0.0 to 1.0 because nextFloat does too, do I just divide by 2?

smilediver commented 2 months ago

It's faster. I'm getting values from 0 to 1.

DelinWorks commented 2 months ago

oh sorry my bad.. I replaced 0x1.0p-24f with FLT_EPSILON, turns out that value is half epsilon

paulocoutinhox commented 2 months ago
#ifndef _FAST_RNG_H__
#define _FAST_RNG_H__

#include <ctime>
#include <cstdint>
#include <limits>

/** a fast more effective seeded random number generator struct, made by kiss rng.
 * it uses a simple algorithm to improve the speed of generating random numbers with a decent quality,
 * use this if you're planning to generate large amounts of random numbers in a single frame.
 *
 * @since axmol-1.0.0b8
 */
struct FastRNG
{
    static constexpr uint32_t RNG_RAND_MAX = std::numeric_limits<uint32_t>::max();
    uint32_t _x     = 1;
    uint32_t _y     = 2;
    uint32_t _z     = 4;
    uint32_t _w     = 8;
    uint32_t _carry = 0;
    uint32_t _k     = 0;
    uint32_t _m     = 0;
    uint64_t _seed  = 0;

    // default constructor seeds with current time
    FastRNG() { seed_rng(static_cast<uint32_t>(time(NULL))); }

    // constructor that seeds with a specified value
    FastRNG(uint64_t seed) { seed_rng(seed); }

    // initialize this object with seed
    void seed_rng(uint64_t seed)
    {
        uint32_t seed1 = static_cast<uint32_t>(seed);
        uint32_t seed2 = static_cast<uint32_t>(seed >> 32);

        _seed  = seed;
        _x     = seed1 | 1;
        _y     = seed2 | 2;
        _z     = seed1 | 4;
        _w     = seed2 | 8;
        _carry = 0;
    }

    // returns a random uint32_t value
    uint32_t rng()
    {
        _x = _x * 69069 + 12345;
        _y ^= _y << 13;
        _y ^= _y >> 17;
        _y ^= _y << 5;
        _k     = (_z >> 2) + (_w >> 3) + (_carry >> 2);
        _m     = _w + _w + _z + _carry;
        _z     = _w;
        _w     = _m;
        _carry = _k >> 30;
        return _x + _y + _w;
    }

    // template function to generate random numbers in a given range
    template <typename T>
    T range(T min, T max)
    {
        return static_cast<T>(min + (rng() % (max - min + 1)));
    }

    // template function to generate random numbers from 0 to max
    template <typename T>
    T max(T max)
    {
        return static_cast<T>(rng() % (max + 1));
    }

    // returns a random float from min to max
    float rangef(float min = -1.0f, float max = 1.0f)
    {
        return min + static_cast<float>(rng()) / (RNG_RAND_MAX / (max - min));
    }

    // returns a random float from 0.0 to max
    float maxf(float max) { return static_cast<float>(rng()) / (RNG_RAND_MAX / max); }

    // returns a random float from 0.0 to 1.0
    float float01() { return static_cast<float>(rng()) / RNG_RAND_MAX; }

    // returns either false or true randomly
    bool bool01() { return static_cast<bool>(rng() & 1); }
};

#endif // _FAST_RNG_H__

or

#ifndef _FAST_RNG_H__
#define _FAST_RNG_H__

#include <ctime>
#include <cstdint>
#include <limits>

/** a fast more effective seeded random number generator struct, made by kiss rng.
 * it uses a simple algorithm to improve the speed of generating random numbers with a decent quality,
 * use this if you're planning to generate large amounts of random numbers in a single frame.
 *
 * @since axmol-1.0.0b8
 */
struct FastRNG
{
    static constexpr uint32_t RNG_RAND_MAX = std::numeric_limits<uint32_t>::max();
    uint32_t _x     = 1;
    uint32_t _y     = 2;
    uint32_t _z     = 4;
    uint32_t _w     = 8;
    uint32_t _carry = 0;

    // default constructor seeds with current time
    FastRNG() { seed_rng(static_cast<uint32_t>(time(NULL))); }

    // constructor that seeds with a specified value
    FastRNG(uint64_t seed) { seed_rng(seed); }

    // initialize this object with seed
    void seed_rng(uint64_t seed)
    {
        uint32_t seed1 = static_cast<uint32_t>(seed);
        uint32_t seed2 = static_cast<uint32_t>(seed >> 32);

        _x     = seed1 | 1;
        _y     = seed2 | 2;
        _z     = seed1 | 4;
        _w     = seed2 | 8;
        _carry = 0;
    }

    // returns a random uint32_t value
    uint32_t rng()
    {
        // update _x using congruential generator
        _x = _x * 69069 + 12345;

        // update _y using xorshift
        _y ^= (_y << 13);
        _y ^= (_y >> 17);
        _y ^= (_y << 5);

        // update _z and _w using mwc
        uint32_t t = (_z << 26) + (_z >> 6);
        _z = _w;
        _w = t + _carry;
        _carry = (t + _carry) < t;

        // combine the results
        return _x + _y + _w;
    }

    // template function to generate random numbers in a given range
    template <typename T>
    T range(T min, T max)
    {
        return static_cast<T>(min + (rng() % (max - min + 1)));
    }

    // template function to generate random numbers from 0 to max
    template <typename T>
    T max(T max)
    {
        return static_cast<T>(rng() % (max + 1));
    }

    // returns a random float from min to max
    float rangef(float min = -1.0f, float max = 1.0f)
    {
        return min + static_cast<float>(rng()) / (RNG_RAND_MAX / (max - min));
    }

    // returns a random float from 0.0 to max
    float maxf(float max) { return static_cast<float>(rng()) / (RNG_RAND_MAX / max); }

    // returns a random float from 0.0 to 1.0
    float float01() { return static_cast<float>(rng()) / RNG_RAND_MAX; }

    // returns either false or true randomly
    bool bool01() { return static_cast<bool>(rng() & 1); }
};

#endif // _FAST_RNG_H__
DelinWorks commented 2 months ago

like thjs?

struct FastRNG
{
    uint32_t s[4];

    uint64_t nextSeed(uint64_t& state) {
        uint64_t z = (state += 0x9e3779b97f4a7c15);
        z = (z ^ (z >> 30)) * 0xbf58476d1ce4e5b9;
        z = (z ^ (z >> 27)) * 0x94d049bb133111eb;
        return z ^ (z >> 31);
    }

    FastRNG()
    {
        seed(static_cast<uint64_t>(rand()) << 32 | rand());
    }

    void seed(uint64_t seed)
    {
        uint64_t state = seed;
        uint64_t states[2];
        memset(states, 0, 16);
        states[0] = nextSeed(state);
        states[1] = nextSeed(state);
        memcpy(s, states, 16);
    }

    static inline uint32_t rotL(const uint32_t x, int k)
    {
        return (x << k) | (x >> (32 - k));
    }

    uint32_t next()
    {
        const uint32_t result = rotL(s[1] * 5, 7) * 9;

        const uint32_t t = s[1] << 9;

        s[2] ^= s[0];
        s[3] ^= s[1];
        s[1] ^= s[2];
        s[0] ^= s[3];

        s[2] ^= t;

        s[3] = rotL(s[3], 11);

        return result;
    }

    template <typename T>
    T nextReal()
    {
        if (std::is_same<T, float>::value)
            return static_cast<T>(next() >> 8) * 0x1.0p-24f;
        else if (std::is_same<T, double>::value)
            return static_cast<T>((static_cast<uint64_t>(next()) << 32 | next()) >> 11) * 0x1.0p-53;
        return 0; // possibly assert?
    }

    template <typename T>
    T nextReal(T min, T max)
    {
        return static_cast<T>(min + nextReal<T>() * (max - min));
    }

    template <typename T>
    T nextInt(T min, T max)
    {
        return min + static_cast<T>(nextMax(static_cast<uint32_t>(max - min)));
    }

    uint32_t nextMax(uint32_t max) {
        uint64_t multiresult = static_cast<uint64_t>(next() & 0xFFFF'FFFF) * max;
        uint32_t leftover = static_cast<uint32_t>(multiresult);
        if (leftover < max) {
            uint32_t threshold = (0 - max) % max;
            while (leftover < threshold) {
                multiresult = static_cast<uint64_t>(next() & 0xFFFF'FFFF) * max;
                leftover = static_cast<uint32_t>(multiresult);
            }
        }
        return multiresult >> 32;
    }
};
smilediver commented 2 months ago

I'm not sure FastRNG interface should be changed, that would be a breaking change. Some other observations:

DelinWorks commented 2 months ago

I added the functions on top of the existing ones for compatibility and removed the useless & bit wise, as for time(NULL) seeding with time gives you a random unique state each second and from my experience if you spawn multiple particles on the same second they look exactly the same which looks not so random. calling rand() twice to form 64-bit seed is more randomised and of course you'd need to do srand(time(NULL)) first