dotnet / csharplang

The official repo for the design of the C# programming language
11.53k stars 1.03k forks source link

[Proposal]: Adding Index support to existing library types (redux) #5596

Open CyrusNajmabadi opened 2 years ago

CyrusNajmabadi commented 2 years ago

Adding Index support to existing library types (redux)

Summary

Expand on the ability to implicitly support System.Index and System.Range for additional operations.

For example, the following would be allowed:

void Method(List<T> list)
{
    list.RemoveAt(^1); // no List.RemoveAt(System.Index) method exists.
}

Which would be rewritten to:

void Method(List<T> list)
{
    list.RemoveAt(list.Count - 1);
}

Motivation

Adding index and range support to existing library types made it possible to use System.Index and System.Range in limited cases to support types that didn't know about these new constructs, or which didn't want to update to support them. For example, it was now possible to write either:

void M(List<T> list)
{
    var secondToLastValue = list[^2];
}

// or

Index index;

void M(List<T> list)
{
    var itemAtIndex = list[this.index];    
}

These would translate to the following respectively:

void M(List<T> list)
{
    var secondToLastValue = list[list.Count - 2];
}

// or

Index index;

void M(List<T> list)
{
    var itemAtIndex = list[this.index.GetOffset(list)];    
}

However, while this was good for the general indexing case, it didn't extend any further. So lots of sensible methods (like RemoveAt or InsertAt) still require passing ints and doing the indexing computation manually.

This is solvable at the API level by having new overloads be added either as instance or extension methods. However, this may be a lot of work for an API to do. This can be addressed through source-generators. However, that would involve metadata bloat, as well as indirections through both the extension, and the calls to System.Index.GetOffset. If those alternatives are unpalatable, a pure language approach might then be the desirable way forward.

Detailed design

Given a candidate set of methods, if the methods are not applicable and the methods have a receiver that is Countable and has an this[int] indexer.

Then:

  1. Perform overload resolution again, this time treating any Index parameter as having an implicit conversion to int.
  2. If overload resolution now succeeds then emit the call with the arguments passed along the same as normal exception for any Index arguments that were converted to int. For those arguments, perform the following translation:

Given:

receiver.M(arg1, ..., argN)
  1. When the argument is of the form ^expr2 and the type of expr2 is int, it will be translated to receiver.Length - expr2.
  2. Otherwise, it will be translated as arg.GetOffset(receiver.Length).

Note: The receiver and Length expressions will be spilled as appropriate to ensure any side effects are only executed once. Note: The oder of evaluation should follow the same rules specified here.

Drawbacks

  1. This could be a bit too broad and might allowing using Indexs where inappropriate. For example:
List<int> list = new();
list.Add(^1);

If this is a concern it could be potentially addressed by requiring that the parameter name be index. We could also consider a requirement that the original type (not the instantiated type) was System.Int32. This would prevent an Index being usable just because a generic happened to be instantiated to int.

It may however just not be a concern in practice and we can allow the most generous approach possible.

  1. Do we want to support something similar for Ranges? For example:
List<T> list = new();
list.RemoveRange([1..^1]);

This does seem nice to support. But it also feels much easier and reasonable for libraries to expose themselves. There are generally orders of magnitude less Range method than Index methods in a cursory exploration of the BCL. Furthermore, it's less clear how such a pattern could necessarily be detected. Unlike the Index approach (which can work by assuming there's an implicit conversion present in some cases), this would not be done with conversions. This would be expansion of one arg to many, similar to how params works. Except that there might be many locations in the argument list this could happen, with very little clarity on how to match them up.

We could also look for specifically int index, int count /*or length*/, but it seems very brittle and difficult to model. Hopefully these are rare enough that just adding overloads in the type itself is a palatable approach.

  1. We could relax the rule about overload resolution needing to fail. We could just add this new implicit conversion and have it always be active. This would be simpler, but could lead to breaking changes as existing code might change meaning. For example, if there was an existing instance method that took an int and an extension that took an Index, then we'd now prefer the former. While this was a break, it might be acceptable under the presumption that the extension existed to supply the functionality the language now provided.

Alternatives

This could be added as instance or extension methods to the types themselves. This could be done manually or using extension methods.

Unresolved questions

Design meetings

https://github.com/dotnet/csharplang/blob/main/meetings/2022/LDM-2022-09-28.md#ungrouped

TahirAhmadov commented 2 years ago

Personally, I think this should be an API addition, not a language change. Yes a few more overloads will have to be added to each such collection type, but I doubt those will increase the overall size in any meaningful fashion. If executable size is a concern, then publish-time trimming improvements should be invested into, to reduce the size. The same applies to the indexers - it would have been much less effort for library maintainers to add those overloads, instead of adding a special rule to the language. (Water under the bridge now.)

CyrusNajmabadi commented 2 years ago

The same applies to the indexers - it would have been much less effort for library maintainers to add those overloads, instead of adding a special rule to the language. (Water under the bridge now.)

I don't necessarily see how this is true. Users may literally be targetting versions of these libraries that do not have those API additions available.

CyrusNajmabadi commented 2 years ago

@TahirAhmadov the primary concern here is adding language features that are intended to help existing apis, but which cannot unless all the apis then create all the duplicate entrypoints for all the new features.

TahirAhmadov commented 2 years ago

Regarding older versions - for these methods, definitely extension methods should be used. For indexers, I see your point - there are no extension indexers yet. Still, if you look at the big picture, there is an "edict" that older versions of the frameworks are support-mode only; this leads to this situation where adding a few indexer properties in a patch release, which is trivial code to write, is avoided and instead, a (seemingly) much more expensive language change is introduced. I know this isn't as clear cut as it is with methods, but I (if I could make such a decision) would have still favored lifting the "support only" rule for simple additions which support new language features.

Regarding duplication of entry points, that's true but also is it that big of an issue? It's a very easy thing for even the beginner contributors to add; there are only so many collection types in BCL; other library maintainers can decide whether they want to spend the effort to add these overloads, but I would be surprised if there was a big number of collection types out in the wild; I use one custom collection - ObservableList<T> - which mimics List<T> and adds events to broadcast modifications - which I use in a few niche cases; 99% of collections are the few mostly useful ones from BCL.

My point is, I can see how the language change has its benefits, but in terms of cost/benefit, it's not worth it. There are scores of items where the language is the solution, and those should be prioritized over saving library maintainers from doing 2 hours worth of copy-pasta.

CyrusNajmabadi commented 2 years ago

Regarding duplication of entry points, that's true but also is it that big of an issue?

Yes. You have to find everything and make sure you didn't miss anything.

CyrusNajmabadi commented 2 years ago

My point is, I can see how the language change has its benefits, but in terms of cost/benefit, it's not worth it. There are scores of items where the language is the solution, and those should be prioritized over saving library maintainers from doing 2 hours worth of copy-pasta.

I don't see this as two hours of copypasta. This is effectively an ongoing api tax. Every list-like type has to know and maintain/test/support overloads for all these cases, always.

TahirAhmadov commented 2 years ago

Yes. You have to find everything and make sure you didn't miss anything.

1, I doubt that's a strict requirement. If the most used collections are tackled, we're 90% there, and from there on, we can await requests from the community. 2, finding everything is not that difficult - the same rules that the language uses to determine when to apply this indexing, can be used to search code and find the collection types.

TahirAhmadov commented 2 years ago

I don't see this as two hours of copypasta. This is effectively an ongoing api tax. Every list-like type has to know and maintain/test/support overloads for all these cases, always.

There is a similar type of cost for the language; it's a parsing rule which has to be maintained/tested/supported forever now. Also, I'm pretty sure I could bang out this functionality for List<T>, with tests and all, in about 2 hours. Yes, let's multiply by all the collection types; we're talking about (very generously) maybe 40 hours of work, especially once the template for doing this is in place. To me, the biggest upside to just adding these new overloads is the ease of the task. While it's not free to add new overloads to types, it's quantitative. Whereas with a language change, it introduces complexity - just like you wrote, one has to carefully design the rule to only allow the desirable use cases. This is risk. New overloads are almost zero-risk.

CyrusNajmabadi commented 2 years ago

1, I doubt that's a strict requirement.

It's not strict. But when missed, it just means a friction point. I've run into this myself and it's annoying and feels buggy.

2, finding everything is not that difficult - the same rules that the language uses to determine when to apply this indexing, can be used to search code and find the collection types.

Like i said, it's a tax every must pay in perpetuity. Or we can just detect this once and have all APIs light up here.

TahirAhmadov commented 2 years ago

I just thought of something else - BCL has many collection types which re-implement what List<T> does essentially, which date back to pre-generics days. Yes modifying all those MatchCollections etc. can be a lot of work, but it also raises the question - is there a pathway to change those to have a common base class (either List<T> or ReadOnlyList<T> or maybe a new type)? The way I see it, maintaining all these disparate collection types is the real cost, and changing them to inherit from a common base class (which has all the bells and whistles) would save a lot more than changing the language to work around it.

nathan130200 commented 2 years ago

Implement this for all base IEnumerable types so most like half of enumerated values can benefit over System.Index feature.