Open stephentoub opened 2 years ago
Tagging subscribers to this area: @dotnet/area-system-numerics See info in area-owners.md if you want to be subscribed.
Author: | stephentoub |
---|---|
Assignees: | - |
Labels: | `api-suggestion`, `area-System.Numerics` |
Milestone: | 8.0.0 |
The range used for NextDouble is 0.0 to 1.0; for a floating-point API, we'd presumably want to map to the same range?
Same for NextSingle() so I think so.
It might be interesting to have a quick look at what random methods other stacks expose.
The range used for NextDouble is 0.0 to 1.0; for a floating-point API, we'd presumably want to map to the same range?
Same for NextSingle() so I think so.
It might be interesting to have a quick look at what random methods other stacks expose.
Humbly I leave here my (not well written) random utilities I use when I need to seed to test something. As example of some useful methods or generics could be integrated
public static class RandomExtensions
{
public static bool CoinToss(this Random rng)
{
return CoinToss(rng, .5);
}
public static bool CoinToss(this Random rng, double zeroWeight)
{
return rng.NextDouble() >= zeroWeight;
}
public static byte NextByte(this Random rng)
{
return (byte)rng.Next(byte.MinValue, byte.MaxValue + 1);
}
public static byte NextByte(this Random rng, byte min, byte max)
{
return (byte)rng.Next(min, max);
}
public static sbyte NextSbyte(this Random rng)
{
return (sbyte)rng.Next(sbyte.MinValue, sbyte.MaxValue + 1);
}
public static sbyte NextSbyte(this Random rng, sbyte min, sbyte max)
{
return (sbyte)rng.Next(min, max);
}
public static DateTime NextDateTime(this Random rng)
{
return NextDateTime(rng, DateTime.MinValue, DateTime.MaxValue);
}
public static DateTime NextDateTime(this Random rng, DateTime maxValue)
{
return NextDateTime(rng, DateTime.MinValue, maxValue);
}
public static DateTime NextDateTime(this Random rng, DateTime minValue, DateTime maxValue)
{
if (minValue >= maxValue) throw new ArgumentOutOfRangeException(nameof(minValue));
var nextLong = rng.NextLong(minValue.Ticks, maxValue.Ticks);
return new DateTime(nextLong);
}
public static DateOnly NextDate(this Random rng)
{
return NextDate(rng, DateOnly.MinValue, DateOnly.MaxValue);
}
public static DateOnly NextDate(this Random rng, DateOnly maxValue)
{
return NextDate(rng, DateOnly.MinValue, maxValue);
}
public static DateOnly NextDate(this Random rng, DateOnly minValue, DateOnly maxValue)
{
if (minValue >= maxValue) throw new ArgumentOutOfRangeException(nameof(minValue));
var nextDay = rng.Next(minValue.DayNumber, maxValue.DayNumber);
return DateOnly.FromDayNumber(nextDay);
}
public static TimeOnly NextTime(this Random rng)
{
return NextTime(rng, TimeOnly.MinValue, TimeOnly.MaxValue);
}
public static TimeOnly NextTime(this Random rng, TimeOnly maxValue)
{
return NextTime(rng, TimeOnly.MinValue, maxValue);
}
public static TimeOnly NextTime(this Random rng, TimeOnly minValue, TimeOnly maxValue)
{
if (minValue >= maxValue) throw new ArgumentOutOfRangeException(nameof(minValue));
var nextLong = rng.NextLong(minValue.Ticks, maxValue.Ticks);
return new TimeOnly(nextLong);
}
public static double NextDouble(this Random rng, double minValue, double maxValue)
{
return minValue + rng.NextDouble() * (maxValue - minValue);
}
public static long NextLong(this Random rng)
{
return NextLong(rng, 0, long.MaxValue);
}
public static long NextLong(this Random rng, long maxValue)
{
return NextLong(rng, 0, maxValue);
}
public static long NextLong(this Random rng, long minValue, long maxValue)
{
if (minValue >= maxValue)
throw new ArgumentOutOfRangeException(nameof(minValue));
//Working with ulong so that modulo works correctly with values > long.MaxValue
var uRange = (ulong)(maxValue - minValue);
//Prevent a modolo bias; see https://stackoverflow.com/a/10984975/238419
//for more information.
//In the worst case, the expected number of calls is 2 (though usually it's
//much closer to 1) so this loop doesn't really hurt performance at all.
ulong ulongRand;
var buf = (stackalloc byte[8]);
do
{
rng.NextBytes(buf);
ulongRand = (ulong)BitConverter.ToInt64(buf);
} while (ulongRand > ulong.MaxValue - ((ulong.MaxValue % uRange) + 1) % uRange);
return (long)(ulongRand % uRange) + minValue;
}
public static char NextChar(this Random rng)
{
return NextChar(rng, char.MinValue, char.MaxValue);
}
public static char NextChar(this Random rng, char maxValue)
{
return NextChar(rng, char.MinValue, maxValue);
}
public static char NextChar(this Random rng, char minValue, char maxValue)
{
if (minValue >= maxValue)
throw new ArgumentOutOfRangeException(nameof(minValue));
int min = minValue;
int max = maxValue;
return (char)rng.Next(min, max);
}
public static string NextString(this Random rng, int length)
{
return NextString(rng, length, char.MinValue, char.MaxValue);
}
public static string NextString(this Random rng, int length, char maxValue)
{
return NextString(rng, length, char.MinValue, maxValue);
}
public static string NextString(this Random rng, int length, char minValue, char maxValue)
{
if (length < 0)
throw new ArgumentOutOfRangeException(nameof(length));
var strSpan = (stackalloc char[length]);
for (int i = 0; i < length; i++)
{
strSpan[i] = NextChar(rng, minValue, maxValue);
}
return strSpan.ToString();
}
}
is maxValue inclusive or exclusive? minValue respectively.
In the existing APIs, minValue is inclusive and maxValue is exclusive. This is using the same naming as the existing overloads and would have the same semantics as the existing overloads.
then it should be noted that way...
Next<T>(T.MinValue, T.MaxValue)
will not be the same as Next<T>()
Next<T>(0, T.MaxValue + 1)
will not be possible (assuming T is a signed variant)... if the goal is to have any value of the positive range (of a signed T variant)I'm not against it, but it has to be clearly stated, I fell into that pitfall many times. The documentation mentions it for the current int
implementation in a small remark. For example Next<byte>()
can never return 255.
I would on a personal note as this is a new implementation without backward compatibility ...
Next<T>()
return the full range of TNext<T>(T minInclusive, T maxInclusive)
return a complete range -- or maxExclusive
but have the parameter properly named -- problem with exclusive is still no full positive range possibleif default Next<float>()
would not include 1 then the random value is not usable for the scenarios I use random float numbers.
Oh, I forgot to mention, it is easy to turn inclusive to exclusive by subtracting 1 or Epsilon, but it is not possible to do the opposite, as T.MaxValue is the maximum value and nothing can be bigger.
For example
Next<byte>()
can never return 255.
Which can cause slight bias to results downstream, as I brought up in #41457 .
if default
Next<float>()
would not include 1 then the random value is not usable for the scenarios I use random float numbers.
.... what are these scenarios?
.... what are these scenarios?
raytracing, (audio) signal processing, field density, math, geometry, anything that is too complex to completely calculate... you use a randomization of a normalized set (so 0 to 1) and then calculate until you have enough result values... as the calculation will theoretically go on forever... if the 1 is missing as a random value, one "edge" will be "not sharp"
If you have a large input set, then it doesn't make a difference as by rounding you hit the edge, but sometimes you have a 3 dimensional set or a 4 dimensional set with a very "thin" dimension, for example depth or color/hue (4th dimension). Or you have a reduced data set and the last thing you want is to add a special randomizer for such special cases.
Normalization is usually done with a range of 0 to 1 inclusive. Or -1 to +1 inclusive. The left x-coordinate is 0, the right x-coordinate is 1. Or you have a line from A to B and use a random number to find any point between A and B. But you want A and B both included. A line segment between 2 points includes both points. Normalized, to find A you use 0 and to find B you use 1. In float-color (0,0,0) is Black and (1,1,1) is White, (0.5,0.5,0.5) is Gray etc. The 1 is always part of the set.
I don't think having the same random API for all numeric types is a good idea. Integer numbers and floating-point numbers have different behaviors and expectations. Single generic API for all of them would have very weird behavior. The very core of RNG is producing sequence of uniformly distributed byte values, other methods produce values of specific types from these byte sequences. With that in mind we can look at differences in numeric types:
System.Byte
to System.Int64
, signed and unsigned) have values that are uniformly distributed. Every interval also has uniformly distributed values. So, producing a binary integer from byte sequence is almost as easy as filling the memory span with these bytes (there are tricks to support Next(min, max)
, more on that later). When users call Random.Next<TBinaryInteger>(min, max)
, they expect to receive uniformly distributed values.System.Half
, System.Single
, System.Double
) have values that are not uniformly distributed. Some intervals have uniformly distributed numbers, some haven't - it depends on whether only mantissa is changing, or there's a change in value of exponent as well. If we simply fill a memory span with random bytes we won't get a uniform distribution, we'll get a distribution with logarithmic density, which behaves very counter-intuitively. There is a way to produce uniformly distributed values in the interval [0.0; 1.0)
utilizing dyadic rationals and the specific size of mantissa, which is used in current implementation. Maybe we can define Next(max)
and Next(min, max)
for floating point numbers from Next(0, 1)
via linear transformation, but I'm not sure if that's a good idea - precision loss can skew the distribution.System.Decimal
) is its own can of worms, I can't say a lot about it, maybe have specific API for decimals? With a way to pass a specific required precision.Random.Next<BigInteger>()
produce? Should Random.Next<Complex>()
produce values from full Complex plane or from unit circle?public class Random
{
public T Next<T>() where T : IBinaryInteger<T>; // produces values from [T.Zero;T.MaxValue)
public T Next<T>(T maxValue) where T : IBinaryInteger<T>; // produces values from [T.Zero;maxValue)
public T Next<T>(T minValue, T maxValue) where T : IBinaryInteger<T>; // produces values from [minValue;maxValue)
public T NextFloat<T>() where T:IBinaryFloatingPointIeee754<T> // produces values from [0.0;1.0), can't have the same name :(
}
Basically, Next<T>
for binary integers should produce values in the same range as existing methods, i.e., Random.Next<System.Int32>()
should behave the same way Random.Next()
behaves now.
Next(min, max)
for binary integers current implementation uses the range hit test, utilizing Log2Ceiling
. Do we have Log2Ceiling
for every binary integer type? Or Log2
+ PopCount
?[0.0; 1.0)
uniform distribution current implementation uses dyadic rationals, utilizing type-specific bit-twiddling. Do we have required constants to implement this bit-twiddling in a generic way?P.S. Maybe it is useful to provide raw span filling method in API, e.g.:
public class Random
{
public void Fill<T>(ref T storage) where T: unmanaged;
}
~Before adding newer and newer members to Random
please consider the API suggestion I proposed in #60549.~
~ℹ️ Background Info: The problem is that since C# 2.0 adding new virtual members to
Random
is a breaking change for derivedRandom
classes because until the creator of the derived class can release a new version of their library the non-overridden new methods will fall back to the legacyRandom
behavior.~
~Breaking Issue Example:~
~The last good example is the introduction of NextInt64
and NextSingle
methods. I have a SecureRandom
class (among others). When .NET 6 was released with the new NextInt64
overloads, and before I could release a new version with the new overloads (or even after, if someone targeted .NET 6 without upgrading my libraries), SecureRandom.NextInt64
failed to return cryptographically secure results.~
~Even worse (though the issue stands also without this addendum), I actually have a lot of extension methods, for Random
, including some NextInt64
overloads that returned secure results before .NET 6 but when Random.NextInt64
has been introduced, it was not the case anymore until I upgraded my libraries.~
Possible Solutions:
A. Instead of new Random
members please introduce new extension methods that rely on the already existing members of any Random
implementation (eg. NextBytes
). Just like @DrkWzrd, I also offer my RandomExtensions
for any possible help/inspiration. Here is a living online example as well.
B. As I proposed in #60549, make it possible for a derived implementation to call a protected base constructor that uses an implementation where all virtual members rely on the Sample
method as it was in C# 1.x. This not just prevents unnecessary seed initialization in base but also makes the possible new members forward compatible. The non-overridden members will maybe just be a little less optimized until the library is upgraded ~but will not fall back to the legacy implementation~. I actually even prepared a pull request, in which I followed @GrabYourPitchforks' API review comments, but then unfortunately @stephentoub thought this API was not necessary after all. But now that we face new Random
methods again and the same issues can be expected I kindly ask you to reconsider this statement.
Disclaimer:
This API proposal does not indicate the new Next<T>
methods as virtual ones. If they are really non-virtual, then maybe my concerns are not that relevant in this particular case. Still, it would be really helpful to consider my proposal as it addresses the unnecessary allocations issue as well.
Correction: I remembered incorrectly: The non-derived compatible implementation of the new NextInt64
method falled back to use multiple Next
caused only a performance issue for my SecureRandom
class but did not affect security. Meaning, my proposed API is no longer needed to implemented before this proposal. It would be only a solution for the unnecessary allocations and slow instantiation, which an independent issue from the new virtual methods. I apologize for creating an unnecessary noise in this issue.
because until the creator of the derived class can release a new version of their library the non-overridden new method will fall back to the legacy Random behavior.
Any new virtuals that are added delegate to an existing abstract/virtual on the type. In the case of the NextInt64 you mentioned, for example, it would delegate to the existing virtual Next(int), calling a derived type's implementation if it existed. So I'm not sure what you mean about it falling back to legacy behavior.
We have also not historically considered adding new virtual members to be a breaking change. That is exactly how we have to version types (otherwise we'd end up with RandomEx
, RandomV5
, etc) and we have nearly always reserved the right to extend our types with additional functionality where appropriate.
In the case of the NextInt64 you mentioned, for example, it would delegate to the existing virtual Next(int), calling a derived type's implementation if it existed.
Oops, I remembered incorrectly, I apologize for that part. I just remembered that it ends up calling Net5Compatible.NextUInt64
. As it routes back to Next
the insecure part is not true, sorry for not double checking my memories.
But it still caused two issues:
Next
calls, which is suboptimalWhile 2. was necessary due to the nature of method resolution (so the extension method was no longer picked) my proposed special protected constructor could partly solve this half of the problem as well (using one Sample
call instead of four Next
s). And it would also fix 1, which is still not solved.
I hope you can accept my thoughts with these corrections now.
But it still caused two issues: 1) Initialized the Marvin seed unnecessarily 2) Replaced my alternative implementation with four Next calls, which is suboptimal
For (1), what "marvin seed" are you referring to? The addition of these virtuals doesn't introduce any new seed. There's the existing concern you previously raised about the base ctor's initializing the state array and which we debated at length on a different issue. The addition of these virtuals neither fixes that nor makes that any worse.
For (2), the nature of these being virtual is what lets you address that. When you update your implementation to override the new virtuals, you get to supply a more optimal implementation for your derived type, and until then, it's a functional implementation.
what "marvin seed" are you referring to?
I mean this initialization. Sorry again for being inaccurate. This issue still stands for all derived types and is independent from overriding virtual members.
For (2), the nature of these being virtual is what lets you address that.
Yes and I agreed on that. I just mentioned that maybe it could be a good option to rely on Sample
until I have a chance to override the new members. But this part is less important, it's just a (possibly) positive side effect of the proposed solution for being able to avoid the seed initialization.
As I was mistaken with the security aspect, of course it's not necessary anymore to implement my proposal before adding new virtual members. I added a correction to my original comment. Sorry for the unwanted noise in this issue. I really feel embarrassed.
Thanks.
How would this actually be implemented? This seems impossible in the general case. Even if we have separate methods for IBinaryInteger
and IFloatingPointIeee754
, how would e.g. the integer one be implemented? There's no way to get the size of the T
given an IBinaryInteger
(and it can even represent a variable-sized integer, like BigInteger
). There would have to be a new IFixedLengthBinaryInteger
interface that would expose a static Size
property, in order for the random generator to know how many bytes to generate.
Unless the plan is to only support a fixed number of specific types, as Vector<T>
does. But then why have the method be generic at all instead of having overloads/method groups for each type?
There's no way to get the size of the T given an IBinaryInteger
The T can be constrained to both IBinaryInteger and IMinMaxValue, in which case you have both GetByteCount and MaxValue.
I see, thanks. What about the floating point case?
I think it might be a good idea to prototype this to make sure that it really is feasible.
I think it might be a good idea to prototype this to make sure that it really is feasible.
Uncompiled, untested, certainly possibility for optimization, and purely for demonstrative purposes:
static T GetNext<T>(T maxValue) where T : IBinaryInteger<T>
{
int maxBytes = (int)Math.Ceiling(maxValue.GetShortestBitLength() / 8.0);
Span<byte> span = maxBytes <= 256 ? stackalloc byte[maxBytes] : new byte[maxBytes];
while (true)
{
Random.Shared.NextBytes(span);
T newValue = T.ReadLittleEndian(span, true);
if (newValue < maxValue)
{
return newValue;
}
}
}
Background and motivation
Now that we have the generic math interfaces, we should be able to write a method for generating a random value that can work with any T where T implements the appropriate interface.
API Proposal
API Usage
Alternative Designs
INumber<T>
isn't the right interface; each set will use its own that appropriate.Risks
No response