Open ladeak opened 9 months ago
Tagging subscribers to this area: @dotnet/area-system-linq See info in area-owners.md if you want to be subscribed.
Author: | ladeak |
---|---|
Assignees: | - |
Labels: | `api-suggestion`, `area-System.Linq`, `untriaged` |
Milestone: | - |
cc @tannergooding
@eiriktsarpalis this seems reasonable overall to me.
There's potentially a desire to support more than just IBinaryInteger
and to support any INumber
. That would allow you to do effectively T current = start; while (current < end) { yield return current++; }
. However, there are some edge cases to consider for float
(current++
may yield back current
for inf
, nan
, or large finite values) but the relevant APIs to handle such cases do exist on the INumber
type.
It may even be desirable to allow users to specify a custom step (that is, give a range from start
to end
but in increments of 2
or 4
instead of 1
; but that might also be an API with a different name.
This could also be a source-breaking change, right? e.g. today if I have:
IEnumerable<int> e = Enumerable.Range((byte)0, (byte)5);
and then we add the proposed API, that will fail to compile because it won't be able to cast the resulting IEnumerable<byte>
to IEnumerable<int>
.
Yes, but that is also a simple fix and the compiler is (by default) already raising a diagnostic saying the cast is redundant.
and the compiler is (by default) already raising a diagnostic saying the cast is redundant.
Not if it were instead:
byte start = 0, count = 5;
IEnumerable<int> e = Enumerable.Range(start, count);
Not saying it's a deal-breaker, simply pointing out that this has more source-breaking consequences than come with typical static API additions.
.... that case is likely too simple, anyhow, since the Range
is almost certainly being immediately transformed into something else. Whether that ends up breaking is more complicated - eg, using ToDictionary(...)
and passing the results to some function is likely breaking.
Ironically the more complicated cases that add the range value to a pre-defined domain type are likely less breaking, because auto upcasting would occur at that point....
Let me update the proposal with the suggestions:
Including all numbers (I think for floating point numbers, it would make sense to provide the increment) as @tannergooding suggest:
public static class Enumerable
{
public static IEnumerable<T> Range<T>(T start, T count, T increment) where T : INumber<T>
}
I believe, this API would not be a breaking change anymore. I will also capture the concern of breaking change by @stephentoub in the proposal.
Seems reasonable, we can try reviewing it.
On second thought, I find it kind of odd that we have a generic count
parameter. What does it even mean to write Enumerable.Range(start: 0, count: 3.1, increment: 0.1)
? I think this should be changed so that either:
int
orMy personal preference would be for option (2) since it's consistent with similar APIs available in Matlab, Python, or F#:
public static class Enumerable
{
public static IEnumerable<T> Range<T>(T start, T end, T step) where T : INumber<T>
{
// TODO determine if end is exclusive or inclusive upper bound
for (T current = start; current <= end; current += step)
{
yield return current;
current += step;
}
}
}
I suspect this also implies that we should adopt a different name for the method, since it changes the semantics of the second parameter when compared to the existing Range
method.
Yes, I changed from end
to count
because I realized the current Range method uses count
and given this is an overload I was worried if it is going to be confusing or not. Let me update the proposal as suggested and change it back to end
(Option 2).
To share additional information from community implementations, there is:
.... given that end
is likely to be practically exclusive for most floating-point inputs (may be surprising when it ends up being inclusive), I would prefer making it explicitly exclusive.
Note that we should make it safe to use negative step
s, where end
is negative...
I would expect end to be exclusive
so that M(start: 0, end: Length, step: 1)
works as "expected" and so that it matches the vast majority of other decisions .NET has made with regards to using an inclusive start and exclusive end.
Exclusive end, overall, makes the code simpler and more intuitive for a large number of scenarios and makes it work more naturally with arrays, ranges, indexing, and other expressions that are typical for .NET developers. It also typically makes the code cheaper to actually execute and to do other common computations such as determining the length (which is simply end - start
)
It could also simply be start, count, step
(in which case its then consistent, parameter-wise, with the existing Range API) and that would also work for many of the same purposes.
Should the constraint be where T : INumber<T>
or the potentially more general where T : IAdditionOperators<T, T, T>, IComparisonOperators<T, T, bool>
?
It could also simply be start, count, step (in which case its then consistent, parameter-wise, with the existing Range API) and that would also work for many of the same purposes.
That seems a bit counterintuitive though, in most cases I'd have to run a division in my head to determine the appropriate count
value.
Should the constraint be where T : INumber
or the potentially more general where T : IAdditionOperators<T, T, T>, IComparisonOperators<T, T, bool>?
We could, but it in my view complicates the API signature without a clear use case. Can you define intervals in types that do not conform with the INumber<T>
abstraction?
It could also simply be start, count, step (in which case its then consistent, parameter-wise, with the existing Range API) and that would also work for many of the same purposes.
That seems a bit counterintuitive though, in most cases I'd have to run a division in my head to determine the appropriate
count
value.
It may be counter-intuitive to what's desired, but I think the alternative is counter-intuitive to the current API.
For example, if I look at the following code:
var seq1 = Enumerable.Range(2, 3);
var seq2 = Enumerable.Range(2, 3, 1);
What reason do I have to expect that seq1.SequenceEqual(seq2) == false
? From a readability standpoint, I think Enumerable.Range(int, int) should have the same behavior as Enumerable.Range(int, int, 1);
Perhaps a different name would be helpful if [start..end) is the desired sequence of values?
Correct, which is why using a different name might be necessary.
That seems a bit counterintuitive though, in most cases I'd have to run a division in my head to determine the appropriate count value.
You're likely going to end up doing some mental arithmetic no matter what you do and I imagine that many usages will include some attached comment like // [2, 3)
. There isn't one true answer here, with many languages picking different options; so I believe it is better for .NET to be consistent with how .NET typically does things (which is either taking a count
or exclusive end
).
The inclusive start
and exclusive end
is also how many other of the most widely used/popular languages have opted to do this, including python's range()
, C/C++ iterators, Rust's step_by
on their Range<Idx>
type, etc. It is also how I would expect C#'s existing range syntax would end up working if it was ever extended (as some languages support) to allow a step to be provided. The general consistency with other C family of languages and languages like Python which are widely popular for ML/AI also makes it a good choice for people looking to integrate their code with .NET
maybe this?
public static class Enumerable
{
public static IEnumerable<T> Range<T>(T start, int count) where T : INumber<T>;
public static IEnumerable<T> Range<T>(T start, int count, T step) where T : INumber<T>;
public static IEnumerable<T> Sequence<T>(T start, T end, T step) where T : INumber<T>;
}
If you have an int
count you are limiting the range for example for long
values.
Correct, which is why using a different name might be necessary.
I do agree, maybe Sequence
could be a better name, but it would probably need 2 overloads (the one proposed and one without the increment parameter) which could be increment by T.One
similly as the existing Range
today.
The only thing I don't like about the two methods approach, is that intuitively I would expect that Range
is the one taking start, end
and Sequence
is the one taking start, count
(especially given how range expressions work in most languages). But, it's too late to fix that now.
int
but implicitly convert to int
, such as char
and short
. With this API, the return value changes from IEnumerable<int>
to IEnumerable<TypeOfFirstParameter>
Enumerable.Range
with a step is also interesting but has nuance that would need further discussion
Range(T start, int count, T step)
would bepublic static class Enumerable
{
public static IEnumerable<T> Range<T>(T start, int count) where T : IBinaryInteger<T>
}
This isn't going to make it to .NET9, right?
That's right, this now falls into the .NET 10 bucket.
Any chance this could be considered alongside https://github.com/dotnet/runtime/issues/64031?
I think they could be kept separate.
Any chance this could be considered alongside #64031?
For consistency it's sometimes better to lump things together in order to ensure we produce a solution that makes sense across the board. For cases like this, where we can add things piecemeal without complexity explosion, keeping them separate helps to get easy stuff done without being held hostage by harder work.
I'm late to the party but technically public static IEnumerable<T> Range<T>(T start, int count)
should only need IIncrementOperators<T>
as a constraint and public static IEnumerable<T> Range<T>(T start, int count, T step)
should only need System.Numerics.IAdditionOperators<T,T,T>
. This would make it much more generic and will allow it work with types that are not numbers.
I'm late to the party but technically
public static IEnumerable<T> Range<T>(T start, int count)
should only needIIncrementOperators<T>
as a constraint andpublic static IEnumerable<T> Range<T>(T start, int count, T step)
should only needSystem.Numerics.IAdditionOperators<T,T,T>
. This would make it much more generic and will allow it work with types that are not numbers.
Yes! I can rewrite the implementation to such a public API. It must first be changed and approved.
It is not necessarily always ideal to make it the least constrained possible.
It is largely intentional that the approved shape is only:
public static IEnumerable<T> Range<T>(T start, int count) where T : IBinaryInteger<T>
This is in part because what this does can be non-sensible for other types, such as floating-point
where increment may not mutate the value.
API review discussed exposing additional less constrained APIs in the future, such as some Sequence
API that allows more types/control and which does not have any particular expectations like Range
might have.
This is in part because what this does can be non-sensible for other types, such as
floating-point
where increment may not mutate the value.
If one is multiplied by the number of steps and added to the start, then the implementation will count correctly for any numbers.
API review discussed exposing additional less constrained APIs in the future, such as some
Sequence
API that allows more types/control and which does not have any particular expectations likeRange
might have.
The only additional thing the current implementation does is check for no range overflow. We can move on to the general case for all numbers if we remove this check.
If one is multiplied by the number of steps and added to the start, then the implementation will count correctly for any numbers.
This will not do the right thing in all cases and it may be unexpected in others. It may be unexpected for a user to see 16777216.0, 16777216.0, 16777218.0
for Range(16777216.0f, 3)
for example
It was very explicitly intentional that the API was approved in the shape it was. We can relax this in the future if we decide that is the best case scenario. But that would only occur after fully considering the rest of the scenarios, what it means for arbitrary types, etc.
It was very explicitly intentional that the API was approved in the shape it was. We can relax this in the future if we decide that is the best case scenario. But that would only occur after fully considering the rest of the scenarios, what it means for arbitrary types, etc.
I fully support an extension to more general types and a step argument. This will allow the method to be used for decimal
numbers, there are no problems with rounding.
This will also allow you to create descending sequences and repeat the same number the required number of times.
This will allow the method to be used for decimal numbers, there are no problems with rounding.
decimal
numbers notably still have the same rounding issues that float
and double
have. They are still a value with finite precision and range trying to represent data that can have infinite precision and unbounded range.
(1.0m / 3.0m) * 3.0m != 1.0m
being one of the simplest examples.
Likewise you can have 1_000_000_000_000_000.0m + 0.000_000_000_000_000_1m == 1_000_000_000_000_000.0m
(no change) because it would require more than 28 significant digits to represent.
The only real difference between decimal
and double
is that representable values are multiples of powers of 10, rather than multiples of powers of 2, and so when you write a literal, that is likely the exact literal you're using in a given computation.
Background and motivation - UPDATED 2024/01/31
So that Enumerable.Range can be used with anything that implements
IBinaryInteger<T>
.API Proposal
API Usage
Alternative Designs
An alternative could be to handle all
INumber<T>
, in this case having a 3rd increment parameter could be reasonableRisks
The original proposal with 2 parameters overload for the
Range
method could be source-breaking, as the following case returns anIEnumerable<int>
and notIEnumerablem<byte>
today: