[API Proposal]: Random.Next<T>

stephentoub commented 2 years ago

Background and motivation

Now that we have the generic math interfaces, we should be able to write a method for generating a random value that can work with any T where T implements the appropriate interface.

API Proposal

namespace System;

public class Random
{
    public T Next<T>() where T : INumber<T>;
    public T Next<T>(T maxValue) where T : INumber<T>;
    public T Next<T>(T minValue, T maxValue) where T : INumber<T>;
}

API Usage

Random r = new Random();
Int128 value = r.Next<Int128>();

Alternative Designs

We might need two sets of methods, one for integral values and one for floating-point values. In that case, INumber<T> isn't the right interface; each set will use its own that appropriate.
The range used for NextDouble is 0.0 to 1.0; for a floating-point API, we'd presumably want to map to the same range?

Risks

No response

ghost commented 2 years ago

Tagging subscribers to this area: @dotnet/area-system-numerics See info in area-owners.md if you want to be subscribed.

Issue Details

### Background and motivation Now that we have the generic math interfaces, we should be able to write a method for generating a random value that can work with any T where T implements the appropriate interface. ### API Proposal ```csharp namespace System; public class Random { public T Next() where T : INumber; public T Next(T maxValue) where T : INumber; public T Next(T minValue, T maxValue) where T : INumber; } ``` ### API Usage ```csharp Random r = new Random(); Int128 value = r.Next(); ``` ### Alternative Designs - We might need two sets of methods, one for integral values and one for floating-point values. In that case, `INumber` isn't the right interface; each set will use its own that appropriate. - The range used for NextDouble is 0.0 to 1.0; for a floating-point API, we'd presumably want to map to the same range? ### Risks _No response_

Author:	stephentoub
Assignees:	-
Labels:	`api-suggestion`, `area-System.Numerics`
Milestone:	8.0.0

danmoseley commented 2 years ago

The range used for NextDouble is 0.0 to 1.0; for a floating-point API, we'd presumably want to map to the same range?

Same for NextSingle() so I think so.

It might be interesting to have a quick look at what random methods other stacks expose.

DrkWzrd commented 2 years ago

The range used for NextDouble is 0.0 to 1.0; for a floating-point API, we'd presumably want to map to the same range?

Same for NextSingle() so I think so.

It might be interesting to have a quick look at what random methods other stacks expose.

Humbly I leave here my (not well written) random utilities I use when I need to seed to test something. As example of some useful methods or generics could be integrated

public static class RandomExtensions
{
    public static bool CoinToss(this Random rng)
    {
        return CoinToss(rng, .5);
    }

    public static bool CoinToss(this Random rng, double zeroWeight)
    {
        return rng.NextDouble() >= zeroWeight;
    }

    public static byte NextByte(this Random rng)
    {
        return (byte)rng.Next(byte.MinValue, byte.MaxValue + 1);
    }

    public static byte NextByte(this Random rng, byte min, byte max)
    {
        return (byte)rng.Next(min, max);
    }

    public static sbyte NextSbyte(this Random rng)
    {
        return (sbyte)rng.Next(sbyte.MinValue, sbyte.MaxValue + 1);
    }

    public static sbyte NextSbyte(this Random rng, sbyte min, sbyte max)
    {
        return (sbyte)rng.Next(min, max);
    }

    public static DateTime NextDateTime(this Random rng)
    {
        return NextDateTime(rng, DateTime.MinValue, DateTime.MaxValue);
    }

    public static DateTime NextDateTime(this Random rng, DateTime maxValue)
    {
        return NextDateTime(rng, DateTime.MinValue, maxValue);
    }

    public static DateTime NextDateTime(this Random rng, DateTime minValue, DateTime maxValue)
    {
        if (minValue >= maxValue) throw new ArgumentOutOfRangeException(nameof(minValue));
        var nextLong = rng.NextLong(minValue.Ticks, maxValue.Ticks);
        return new DateTime(nextLong);
    }

    public static DateOnly NextDate(this Random rng)
    {
        return NextDate(rng, DateOnly.MinValue, DateOnly.MaxValue);
    }

    public static DateOnly NextDate(this Random rng, DateOnly maxValue)
    {
        return NextDate(rng, DateOnly.MinValue, maxValue);
    }

    public static DateOnly NextDate(this Random rng, DateOnly minValue, DateOnly maxValue)
    {
        if (minValue >= maxValue) throw new ArgumentOutOfRangeException(nameof(minValue));
        var nextDay = rng.Next(minValue.DayNumber, maxValue.DayNumber);
        return DateOnly.FromDayNumber(nextDay);
    }

    public static TimeOnly NextTime(this Random rng)
    {
        return NextTime(rng, TimeOnly.MinValue, TimeOnly.MaxValue);
    }

    public static TimeOnly NextTime(this Random rng, TimeOnly maxValue)
    {
        return NextTime(rng, TimeOnly.MinValue, maxValue);
    }

    public static TimeOnly NextTime(this Random rng, TimeOnly minValue, TimeOnly maxValue)
    {
        if (minValue >= maxValue) throw new ArgumentOutOfRangeException(nameof(minValue));
        var nextLong = rng.NextLong(minValue.Ticks, maxValue.Ticks);
        return new TimeOnly(nextLong);
    }

    public static double NextDouble(this Random rng, double minValue, double maxValue)
    {
        return minValue + rng.NextDouble() * (maxValue - minValue);
    }

    public static long NextLong(this Random rng)
    {
        return NextLong(rng, 0, long.MaxValue);
    }

    public static long NextLong(this Random rng, long maxValue)
    {
        return NextLong(rng, 0, maxValue);
    }

    public static long NextLong(this Random rng, long minValue, long maxValue)
    {
        if (minValue >= maxValue)
            throw new ArgumentOutOfRangeException(nameof(minValue));

        //Working with ulong so that modulo works correctly with values > long.MaxValue
        var uRange = (ulong)(maxValue - minValue);

        //Prevent a modolo bias; see https://stackoverflow.com/a/10984975/238419
        //for more information.
        //In the worst case, the expected number of calls is 2 (though usually it's
        //much closer to 1) so this loop doesn't really hurt performance at all.
        ulong ulongRand;
        var buf = (stackalloc byte[8]);
        do
        {
            rng.NextBytes(buf);
            ulongRand = (ulong)BitConverter.ToInt64(buf);
        } while (ulongRand > ulong.MaxValue - ((ulong.MaxValue % uRange) + 1) % uRange);

        return (long)(ulongRand % uRange) + minValue;
    }

    public static char NextChar(this Random rng)
    {
        return NextChar(rng, char.MinValue, char.MaxValue);
    }

    public static char NextChar(this Random rng, char maxValue)
    {
        return NextChar(rng, char.MinValue, maxValue);
    }

    public static char NextChar(this Random rng, char minValue, char maxValue)
    {
        if (minValue >= maxValue) 
            throw new ArgumentOutOfRangeException(nameof(minValue));

        int min = minValue;
        int max = maxValue;
        return (char)rng.Next(min, max);
    }

    public static string NextString(this Random rng, int length)
    {
        return NextString(rng, length, char.MinValue, char.MaxValue);
    }

    public static string NextString(this Random rng, int length, char maxValue)
    {
        return NextString(rng, length, char.MinValue, maxValue);
    }

    public static string NextString(this Random rng, int length, char minValue, char maxValue)
    {
        if (length < 0) 
            throw new ArgumentOutOfRangeException(nameof(length));

        var strSpan = (stackalloc char[length]);

        for (int i = 0; i < length; i++)
        {
            strSpan[i] = NextChar(rng, minValue, maxValue);
        }
        return strSpan.ToString();
    }
}

hopperpl commented 2 years ago

is maxValue inclusive or exclusive? minValue respectively.

stephentoub commented 2 years ago

In the existing APIs, minValue is inclusive and maxValue is exclusive. This is using the same naming as the existing overloads and would have the same semantics as the existing overloads.

hopperpl commented 2 years ago

then it should be noted that way...

because Next<T>(T.MinValue, T.MaxValue) will not be the same as Next<T>()
also Next<T>(0, T.MaxValue + 1) will not be possible (assuming T is a signed variant)... if the goal is to have any value of the positive range (of a signed T variant)

I'm not against it, but it has to be clearly stated, I fell into that pitfall many times. The documentation mentions it for the current int implementation in a small remark. For example Next<byte>() can never return 255.

I would on a personal note as this is a new implementation without backward compatibility ...

have Next<T>() return the full range of T
have Next<T>(T minInclusive, T maxInclusive) return a complete range -- or maxExclusive but have the parameter properly named -- problem with exclusive is still no full positive range possible
and when T is a floating point number default to T.Zero and T.One (inclusive)

if default Next<float>() would not include 1 then the random value is not usable for the scenarios I use random float numbers.

Oh, I forgot to mention, it is easy to turn inclusive to exclusive by subtracting 1 or Epsilon, but it is not possible to do the opposite, as T.MaxValue is the maximum value and nothing can be bigger.

Clockwork-Muse commented 2 years ago

For example Next<byte>() can never return 255.

Which can cause slight bias to results downstream, as I brought up in #41457 .

if default Next<float>() would not include 1 then the random value is not usable for the scenarios I use random float numbers.

.... what are these scenarios?

hopperpl commented 2 years ago

.... what are these scenarios?

raytracing, (audio) signal processing, field density, math, geometry, anything that is too complex to completely calculate... you use a randomization of a normalized set (so 0 to 1) and then calculate until you have enough result values... as the calculation will theoretically go on forever... if the 1 is missing as a random value, one "edge" will be "not sharp"

If you have a large input set, then it doesn't make a difference as by rounding you hit the edge, but sometimes you have a 3 dimensional set or a 4 dimensional set with a very "thin" dimension, for example depth or color/hue (4th dimension). Or you have a reduced data set and the last thing you want is to add a special randomizer for such special cases.

Normalization is usually done with a range of 0 to 1 inclusive. Or -1 to +1 inclusive. The left x-coordinate is 0, the right x-coordinate is 1. Or you have a line from A to B and use a random number to find any point between A and B. But you want A and B both included. A line segment between 2 points includes both points. Normalized, to find A you use 0 and to find B you use 1. In float-color (0,0,0) is Black and (1,1,1) is White, (0.5,0.5,0.5) is Gray etc. The 1 is always part of the set.

aloraman commented 2 years ago

Thoughts on API shape

I don't think having the same random API for all numeric types is a good idea. Integer numbers and floating-point numbers have different behaviors and expectations. Single generic API for all of them would have very weird behavior. The very core of RNG is producing sequence of uniformly distributed byte values, other methods produce values of specific types from these byte sequences. With that in mind we can look at differences in numeric types:

Binary integers (from System.Byte to System.Int64, signed and unsigned) have values that are uniformly distributed. Every interval also has uniformly distributed values. So, producing a binary integer from byte sequence is almost as easy as filling the memory span with these bytes (there are tricks to support Next(min, max), more on that later). When users call Random.Next<TBinaryInteger>(min, max), they expect to receive uniformly distributed values.
IEEE754 floating-point numbers (System.Half, System.Single, System.Double) have values that are not uniformly distributed. Some intervals have uniformly distributed numbers, some haven't - it depends on whether only mantissa is changing, or there's a change in value of exponent as well. If we simply fill a memory span with random bytes we won't get a uniform distribution, we'll get a distribution with logarithmic density, which behaves very counter-intuitively. There is a way to produce uniformly distributed values in the interval [0.0; 1.0) utilizing dyadic rationals and the specific size of mantissa, which is used in current implementation. Maybe we can define Next(max) and Next(min, max) for floating point numbers from Next(0, 1) via linear transformation, but I'm not sure if that's a good idea - precision loss can skew the distribution.
Non-IEEE754 floating-point numbers (System.Decimal) is its own can of worms, I can't say a lot about it, maybe have specific API for decimals? With a way to pass a specific required precision.
Other new numeric types - Here be dragons! What should Random.Next<BigInteger>() produce? Should Random.Next<Complex>() produce values from full Complex plane or from unit circle?

Proposed API shape

public class Random
{
    public T Next<T>() where T : IBinaryInteger<T>; // produces values from [T.Zero;T.MaxValue)
    public T Next<T>(T maxValue) where T : IBinaryInteger<T>; // produces values from [T.Zero;maxValue)
    public T Next<T>(T minValue, T maxValue) where T : IBinaryInteger<T>; // produces values from [minValue;maxValue)
    public T NextFloat<T>() where T:IBinaryFloatingPointIeee754<T> // produces values from [0.0;1.0), can't have the same name :(
}

Basically, Next<T> for binary integers should produce values in the same range as existing methods, i.e., Random.Next<System.Int32>() should behave the same way Random.Next() behaves now.

Other questions

To implement Next(min, max) for binary integers current implementation uses the range hit test, utilizing Log2Ceiling. Do we have Log2Ceiling for every binary integer type? Or Log2 + PopCount?
To produce [0.0; 1.0) uniform distribution current implementation uses dyadic rationals, utilizing type-specific bit-twiddling. Do we have required constants to implement this bit-twiddling in a generic way?

P.S. Maybe it is useful to provide raw span filling method in API, e.g.:

public class Random
{
    public void Fill<T>(ref T storage) where T: unmanaged;
}

koszeggy commented 1 year ago

~Before adding newer and newer members to Random please consider the API suggestion I proposed in #60549.~

~ℹ️ Background Info: The problem is that since C# 2.0 adding new virtual members to Random is a breaking change for derived Random classes because until the creator of the derived class can release a new version of their library the non-overridden new methods will fall back to the legacy Random behavior.~

~Breaking Issue Example:~

~The last good example is the introduction of NextInt64 and NextSingle methods. I have a SecureRandom class (among others). When .NET 6 was released with the new NextInt64 overloads, and before I could release a new version with the new overloads (or even after, if someone targeted .NET 6 without upgrading my libraries), SecureRandom.NextInt64 failed to return cryptographically secure results.~

~Even worse (though the issue stands also without this addendum), I actually have a lot of extension methods, for Random, including some NextInt64 overloads that returned secure results before .NET 6 but when Random.NextInt64 has been introduced, it was not the case anymore until I upgraded my libraries.~

Possible Solutions:

A. Instead of new Random members please introduce new extension methods that rely on the already existing members of any Random implementation (eg. NextBytes). Just like @DrkWzrd, I also offer my RandomExtensions for any possible help/inspiration. Here is a living online example as well.

B. As I proposed in #60549, make it possible for a derived implementation to call a protected base constructor that uses an implementation where all virtual members rely on the Sample method as it was in C# 1.x. This not just prevents unnecessary seed initialization in base but also makes the possible new members forward compatible. The non-overridden members will maybe just be a little less optimized until the library is upgraded ~but will not fall back to the legacy implementation~. I actually even prepared a pull request, in which I followed @GrabYourPitchforks' API review comments, but then unfortunately @stephentoub thought this API was not necessary after all. But now that we face new Random methods again and the same issues can be expected I kindly ask you to reconsider this statement.

Disclaimer:

This API proposal does not indicate the new Next<T> methods as virtual ones. If they are really non-virtual, then maybe my concerns are not that relevant in this particular case. Still, it would be really helpful to consider my proposal as it addresses the unnecessary allocations issue as well.

Correction: I remembered incorrectly: The non-derived compatible implementation of the new NextInt64 method falled back to use multiple Next caused only a performance issue for my SecureRandom class but did not affect security. Meaning, my proposed API is no longer needed to implemented before this proposal. It would be only a solution for the unnecessary allocations and slow instantiation, which an independent issue from the new virtual methods. I apologize for creating an unnecessary noise in this issue.

stephentoub commented 1 year ago

because until the creator of the derived class can release a new version of their library the non-overridden new method will fall back to the legacy Random behavior.

Any new virtuals that are added delegate to an existing abstract/virtual on the type. In the case of the NextInt64 you mentioned, for example, it would delegate to the existing virtual Next(int), calling a derived type's implementation if it existed. So I'm not sure what you mean about it falling back to legacy behavior.

tannergooding commented 1 year ago

We have also not historically considered adding new virtual members to be a breaking change. That is exactly how we have to version types (otherwise we'd end up with RandomEx, RandomV5, etc) and we have nearly always reserved the right to extend our types with additional functionality where appropriate.

koszeggy commented 1 year ago

In the case of the NextInt64 you mentioned, for example, it would delegate to the existing virtual Next(int), calling a derived type's implementation if it existed.

Oops, I remembered incorrectly, I apologize for that part. I just remembered that it ends up calling Net5Compatible.NextUInt64. As it routes back to Next the insecure part is not true, sorry for not double checking my memories.

But it still caused two issues:

Initialized the Marvin seed unnecessarily
Replaced my alternative implementation with four Next calls, which is suboptimal

While 2. was necessary due to the nature of method resolution (so the extension method was no longer picked) my proposed special protected constructor could partly solve this half of the problem as well (using one Sample call instead of four Nexts). And it would also fix 1, which is still not solved.

I hope you can accept my thoughts with these corrections now.

stephentoub commented 1 year ago

But it still caused two issues: 1) Initialized the Marvin seed unnecessarily 2) Replaced my alternative implementation with four Next calls, which is suboptimal

For (1), what "marvin seed" are you referring to? The addition of these virtuals doesn't introduce any new seed. There's the existing concern you previously raised about the base ctor's initializing the state array and which we debated at length on a different issue. The addition of these virtuals neither fixes that nor makes that any worse.

For (2), the nature of these being virtual is what lets you address that. When you update your implementation to override the new virtuals, you get to supply a more optimal implementation for your derived type, and until then, it's a functional implementation.

koszeggy commented 1 year ago

what "marvin seed" are you referring to?

I mean this initialization. Sorry again for being inaccurate. This issue still stands for all derived types and is independent from overriding virtual members.

For (2), the nature of these being virtual is what lets you address that.

Yes and I agreed on that. I just mentioned that maybe it could be a good option to rely on Sample until I have a chance to override the new members. But this part is less important, it's just a (possibly) positive side effect of the proposed solution for being able to avoid the seed initialization.

koszeggy commented 1 year ago

As I was mistaken with the security aspect, of course it's not necessary anymore to implement my proposal before adding new virtual members. I added a correction to my original comment. Sorry for the unwanted noise in this issue. I really feel embarrassed.

stephentoub commented 1 year ago

Thanks.

Neme12 commented 1 year ago

How would this actually be implemented? This seems impossible in the general case. Even if we have separate methods for IBinaryInteger and IFloatingPointIeee754, how would e.g. the integer one be implemented? There's no way to get the size of the T given an IBinaryInteger (and it can even represent a variable-sized integer, like BigInteger). There would have to be a new IFixedLengthBinaryInteger interface that would expose a static Size property, in order for the random generator to know how many bytes to generate.

Neme12 commented 1 year ago

Unless the plan is to only support a fixed number of specific types, as Vector<T> does. But then why have the method be generic at all instead of having overloads/method groups for each type?

stephentoub commented 1 year ago

There's no way to get the size of the T given an IBinaryInteger

The T can be constrained to both IBinaryInteger and IMinMaxValue, in which case you have both GetByteCount and MaxValue.

Neme12 commented 1 year ago

I see, thanks. What about the floating point case?

Neme12 commented 1 year ago

I think it might be a good idea to prototype this to make sure that it really is feasible.

stephentoub commented 1 year ago

I think it might be a good idea to prototype this to make sure that it really is feasible.

Uncompiled, untested, certainly possibility for optimization, and purely for demonstrative purposes:

static T GetNext<T>(T maxValue) where T : IBinaryInteger<T>
{
    int maxBytes = (int)Math.Ceiling(maxValue.GetShortestBitLength() / 8.0);
    Span<byte> span = maxBytes <= 256 ? stackalloc byte[maxBytes] : new byte[maxBytes];

    while (true)
    {
        Random.Shared.NextBytes(span);
        T newValue = T.ReadLittleEndian(span, true);
        if (newValue < maxValue)
        {
            return newValue;
        }
    }
}

dotnet / runtime