dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
15.42k stars 4.76k forks source link

[API Proposal]: Generic Enumerable.Range method #97689

Open ladeak opened 9 months ago

ladeak commented 9 months ago

Background and motivation - UPDATED 2024/01/31

So that Enumerable.Range can be used with anything that implements IBinaryInteger<T>.

API Proposal

public static class Enumerable
{
    public static IEnumerable<T> Range<T>(T start, int count) where T : IBinaryInteger<T>
}

API Usage

public record struct MyInterval<T> where T : IBinaryInteger<T>
{
    public T Start { get; }
    public T End { get; }

    // ...

    public IEnumerable<T> GetAll() => Enumerable.Range(Start, End);

Alternative Designs

An alternative could be to handle all INumber<T>, in this case having a 3rd increment parameter could be reasonable

public static class Enumerable
{
    public static IEnumerable<T> Range<T>(T start, T end, T increment) where T : INumber<T>
}

Risks

The original proposal with 2 parameters overload for the Range method could be source-breaking, as the following case returns an IEnumerable<int> and not IEnumerablem<byte> today:

byte start = 0, count = 5;
IEnumerable<int> e = Enumerable.Range(start, count);
ghost commented 9 months ago

Tagging subscribers to this area: @dotnet/area-system-linq See info in area-owners.md if you want to be subscribed.

Issue Details
### Background and motivation So that Enumerable.Range can be used with anything that implements `IBinaryInteger`. ### API Proposal ```csharp public static class Enumerable { public static IEnumerable Range(T start, T end) where T : IBinaryInteger } ``` ### API Usage ```csharp public record struct MyInterval where T : IBinaryInteger { public T Start { get; } public T End { get; } // ... public IEnumerable GetAll() => Enumerable.Range(Start, End); ``` ### Alternative Designs _No response_ ### Risks _No response_
Author: ladeak
Assignees: -
Labels: `api-suggestion`, `area-System.Linq`, `untriaged`
Milestone: -
eiriktsarpalis commented 9 months ago

cc @tannergooding

tannergooding commented 9 months ago

@eiriktsarpalis this seems reasonable overall to me.

There's potentially a desire to support more than just IBinaryInteger and to support any INumber. That would allow you to do effectively T current = start; while (current < end) { yield return current++; }. However, there are some edge cases to consider for float (current++ may yield back current for inf, nan, or large finite values) but the relevant APIs to handle such cases do exist on the INumber type.

It may even be desirable to allow users to specify a custom step (that is, give a range from start to end but in increments of 2 or 4 instead of 1; but that might also be an API with a different name.

stephentoub commented 9 months ago

This could also be a source-breaking change, right? e.g. today if I have:

IEnumerable<int> e = Enumerable.Range((byte)0, (byte)5);

and then we add the proposed API, that will fail to compile because it won't be able to cast the resulting IEnumerable<byte> to IEnumerable<int>.

tannergooding commented 9 months ago

Yes, but that is also a simple fix and the compiler is (by default) already raising a diagnostic saying the cast is redundant.

stephentoub commented 9 months ago

and the compiler is (by default) already raising a diagnostic saying the cast is redundant.

Not if it were instead:

byte start = 0, count = 5;
IEnumerable<int> e = Enumerable.Range(start, count);

Not saying it's a deal-breaker, simply pointing out that this has more source-breaking consequences than come with typical static API additions.

Clockwork-Muse commented 9 months ago

.... that case is likely too simple, anyhow, since the Range is almost certainly being immediately transformed into something else. Whether that ends up breaking is more complicated - eg, using ToDictionary(...) and passing the results to some function is likely breaking.
Ironically the more complicated cases that add the range value to a pre-defined domain type are likely less breaking, because auto upcasting would occur at that point....

ladeak commented 9 months ago

Let me update the proposal with the suggestions:

Including all numbers (I think for floating point numbers, it would make sense to provide the increment) as @tannergooding suggest:

public static class Enumerable
{
    public static IEnumerable<T> Range<T>(T start, T count, T increment) where T : INumber<T>
}

I believe, this API would not be a breaking change anymore. I will also capture the concern of breaking change by @stephentoub in the proposal.

eiriktsarpalis commented 9 months ago

Seems reasonable, we can try reviewing it.

eiriktsarpalis commented 9 months ago

On second thought, I find it kind of odd that we have a generic count parameter. What does it even mean to write Enumerable.Range(start: 0, count: 3.1, increment: 0.1)? I think this should be changed so that either:

  1. The count parameter is of type int or
  2. Change the second parameter to denote the end of the range.

My personal preference would be for option (2) since it's consistent with similar APIs available in Matlab, Python, or F#:

public static class Enumerable
{
    public static IEnumerable<T> Range<T>(T start, T end, T step) where T : INumber<T>
    {
        // TODO determine if end is exclusive or inclusive upper bound
        for (T current = start; current <= end; current += step)
        {
            yield return current;
            current += step;
        }
    }
}

I suspect this also implies that we should adopt a different name for the method, since it changes the semantics of the second parameter when compared to the existing Range method.

ladeak commented 9 months ago

Yes, I changed from end to count because I realized the current Range method uses count and given this is an overload I was worried if it is going to be confusing or not. Let me update the proposal as suggested and change it back to end (Option 2).

viceroypenguin commented 9 months ago

To share additional information from community implementations, there is:

Clockwork-Muse commented 9 months ago

.... given that end is likely to be practically exclusive for most floating-point inputs (may be surprising when it ends up being inclusive), I would prefer making it explicitly exclusive.

Note that we should make it safe to use negative steps, where end is negative...

tannergooding commented 9 months ago

I would expect end to be exclusive so that M(start: 0, end: Length, step: 1) works as "expected" and so that it matches the vast majority of other decisions .NET has made with regards to using an inclusive start and exclusive end.

Exclusive end, overall, makes the code simpler and more intuitive for a large number of scenarios and makes it work more naturally with arrays, ranges, indexing, and other expressions that are typical for .NET developers. It also typically makes the code cheaper to actually execute and to do other common computations such as determining the length (which is simply end - start)

It could also simply be start, count, step (in which case its then consistent, parameter-wise, with the existing Range API) and that would also work for many of the same purposes.

svick commented 9 months ago

Should the constraint be where T : INumber<T> or the potentially more general where T : IAdditionOperators<T, T, T>, IComparisonOperators<T, T, bool>?

eiriktsarpalis commented 9 months ago

It could also simply be start, count, step (in which case its then consistent, parameter-wise, with the existing Range API) and that would also work for many of the same purposes.

That seems a bit counterintuitive though, in most cases I'd have to run a division in my head to determine the appropriate count value.

Should the constraint be where T : INumber or the potentially more general where T : IAdditionOperators<T, T, T>, IComparisonOperators<T, T, bool>?

We could, but it in my view complicates the API signature without a clear use case. Can you define intervals in types that do not conform with the INumber<T> abstraction?

viceroypenguin commented 9 months ago

It could also simply be start, count, step (in which case its then consistent, parameter-wise, with the existing Range API) and that would also work for many of the same purposes.

That seems a bit counterintuitive though, in most cases I'd have to run a division in my head to determine the appropriate count value.

It may be counter-intuitive to what's desired, but I think the alternative is counter-intuitive to the current API.

For example, if I look at the following code:

var seq1 = Enumerable.Range(2, 3);
var seq2 = Enumerable.Range(2, 3, 1);

What reason do I have to expect that seq1.SequenceEqual(seq2) == false? From a readability standpoint, I think Enumerable.Range(int, int) should have the same behavior as Enumerable.Range(int, int, 1);

Perhaps a different name would be helpful if [start..end) is the desired sequence of values?

eiriktsarpalis commented 9 months ago

Correct, which is why using a different name might be necessary.

tannergooding commented 9 months ago

That seems a bit counterintuitive though, in most cases I'd have to run a division in my head to determine the appropriate count value.

You're likely going to end up doing some mental arithmetic no matter what you do and I imagine that many usages will include some attached comment like // [2, 3). There isn't one true answer here, with many languages picking different options; so I believe it is better for .NET to be consistent with how .NET typically does things (which is either taking a count or exclusive end).

The inclusive start and exclusive end is also how many other of the most widely used/popular languages have opted to do this, including python's range(), C/C++ iterators, Rust's step_by on their Range<Idx> type, etc. It is also how I would expect C#'s existing range syntax would end up working if it was ever extended (as some languages support) to allow a step to be provided. The general consistency with other C family of languages and languages like Python which are widely popular for ML/AI also makes it a good choice for people looking to integrate their code with .NET

viceroypenguin commented 9 months ago

Proposal

maybe this?

public static class Enumerable
{
    public static IEnumerable<T> Range<T>(T start, int count) where T : INumber<T>;
    public static IEnumerable<T> Range<T>(T start, int count, T step) where T : INumber<T>;
    public static IEnumerable<T> Sequence<T>(T start, T end, T step) where T : INumber<T>;
}
ladeak commented 9 months ago

If you have an int count you are limiting the range for example for long values.

ladeak commented 9 months ago

Correct, which is why using a different name might be necessary.

I do agree, maybe Sequence could be a better name, but it would probably need 2 overloads (the one proposed and one without the increment parameter) which could be increment by T.One similly as the existing Range today.

tannergooding commented 9 months ago

The only thing I don't like about the two methods approach, is that intuitively I would expect that Range is the one taking start, end and Sequence is the one taking start, count (especially given how range expressions work in most languages). But, it's too late to fix that now.

terrajobst commented 3 months ago

Video

public static class Enumerable
{
    public static IEnumerable<T> Range<T>(T start, int count) where T : IBinaryInteger<T>
}
julealgon commented 3 months ago

This isn't going to make it to .NET9, right?

eiriktsarpalis commented 3 months ago

That's right, this now falls into the .NET 10 bucket.

alrz commented 3 months ago

Any chance this could be considered alongside https://github.com/dotnet/runtime/issues/64031?

eiriktsarpalis commented 3 months ago

I think they could be kept separate.

terrajobst commented 3 months ago

Any chance this could be considered alongside #64031?

For consistency it's sometimes better to lump things together in order to ensure we produce a solution that makes sense across the board. For cases like this, where we can add things piecemeal without complexity explosion, keeping them separate helps to get easy stuff done without being held hostage by harder work.

AmrAlSayed0 commented 3 months ago

I'm late to the party but technically public static IEnumerable<T> Range<T>(T start, int count) should only need IIncrementOperators<T> as a constraint and public static IEnumerable<T> Range<T>(T start, int count, T step) should only need System.Numerics.IAdditionOperators<T,T,T>. This would make it much more generic and will allow it work with types that are not numbers.

AlexRadch commented 3 months ago

I'm late to the party but technically public static IEnumerable<T> Range<T>(T start, int count) should only need IIncrementOperators<T> as a constraint and public static IEnumerable<T> Range<T>(T start, int count, T step) should only need System.Numerics.IAdditionOperators<T,T,T>. This would make it much more generic and will allow it work with types that are not numbers.

Yes! I can rewrite the implementation to such a public API. It must first be changed and approved.

tannergooding commented 3 months ago

It is not necessarily always ideal to make it the least constrained possible.

It is largely intentional that the approved shape is only:

public static IEnumerable<T> Range<T>(T start, int count) where T : IBinaryInteger<T>

This is in part because what this does can be non-sensible for other types, such as floating-point where increment may not mutate the value.

API review discussed exposing additional less constrained APIs in the future, such as some Sequence API that allows more types/control and which does not have any particular expectations like Range might have.

AlexRadch commented 3 months ago

This is in part because what this does can be non-sensible for other types, such as floating-point where increment may not mutate the value.

If one is multiplied by the number of steps and added to the start, then the implementation will count correctly for any numbers.

API review discussed exposing additional less constrained APIs in the future, such as some Sequence API that allows more types/control and which does not have any particular expectations like Range might have.

The only additional thing the current implementation does is check for no range overflow. We can move on to the general case for all numbers if we remove this check.

tannergooding commented 3 months ago

If one is multiplied by the number of steps and added to the start, then the implementation will count correctly for any numbers.

This will not do the right thing in all cases and it may be unexpected in others. It may be unexpected for a user to see 16777216.0, 16777216.0, 16777218.0 for Range(16777216.0f, 3) for example

It was very explicitly intentional that the API was approved in the shape it was. We can relax this in the future if we decide that is the best case scenario. But that would only occur after fully considering the rest of the scenarios, what it means for arbitrary types, etc.

AlexRadch commented 3 months ago

It was very explicitly intentional that the API was approved in the shape it was. We can relax this in the future if we decide that is the best case scenario. But that would only occur after fully considering the rest of the scenarios, what it means for arbitrary types, etc.

I fully support an extension to more general types and a step argument. This will allow the method to be used for decimal numbers, there are no problems with rounding. This will also allow you to create descending sequences and repeat the same number the required number of times.

tannergooding commented 3 months ago

This will allow the method to be used for decimal numbers, there are no problems with rounding.

decimal numbers notably still have the same rounding issues that float and double have. They are still a value with finite precision and range trying to represent data that can have infinite precision and unbounded range.

(1.0m / 3.0m) * 3.0m != 1.0m being one of the simplest examples.

Likewise you can have 1_000_000_000_000_000.0m + 0.000_000_000_000_000_1m == 1_000_000_000_000_000.0m (no change) because it would require more than 28 significant digits to represent.

The only real difference between decimal and double is that representable values are multiples of powers of 10, rather than multiples of powers of 2, and so when you write a literal, that is likely the exact literal you're using in a given computation.