dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
15.27k stars 4.73k forks source link

String should implement IReadOnlyList<char> #19609

Closed danmoseley closed 4 years ago

danmoseley commented 7 years ago
public sealed partial class String : System.Collections.Generic.IEnumerable<char>, System.Collections.IEnumerable, System.IComparable, System.IComparable<string>, System.IConvertible, System.IEquatable<string>, System.ICloneable, System.Collections.Generic.IReadOnlyList<char>
{
   //...
   //IReadOnlyList<char> members:
   int Count { get; }
   char this[int index] { get; }

   public IEnumerator<char> GetEnumerator() { }
  // ...
}


The original proposal was to use IReadOnlyList instead of IEnumerable, but that is not possible since it would break compatibility, so the proposal is instead to just add the new interface implementation.

@stas-sultanov commented on Mon Mar 23 2015

I would like to suggest to make class String to implement IReadOnlyList{char} instead of IEnumerable{char}.

Possibly interface should be implemented explicitly because of the Length property, which is extern and hardcoded in JIT. On the other hand Length could be deprecated and Count should be used instead.


@pharring commented on Thu Mar 26 2015

@stas-sultanov Thanks for the suggestion. Can you give an example of where this would be useful?


@stas-sultanov commented on Fri Mar 27 2015

Sure. Here is an example from one of my projects: A ConvertEx class which provides a set of methods to convert data from base-n string to byte array and vice-versa. This class has a lot of methods like:

public static TryResult<String> TryToBase16String(byte[] data, int offset, int length)
public static TryResult<Byte[]> TryFromBase16String(string data, int offset, int length)

for this methods i'd like to use single method that checks arguments

    private static void CheckArguments<T>(IReadOnlyCollection<T> data, int offset, int length)
    {
        if (data == null)
            throw new ArgumentNullException(@"data");
        if (length < 0)
            throw new ArgumentOutOfRangeException(@"length", length, @"Is less than 0.");
        if (offset < 0)
            throw new ArgumentOutOfRangeException(@"offset", offset, @"Is less than 0.");
        if (offset > (data.Count - length))
            throw new ArgumentOutOfRangeException(@"data", data.Count, @"Offset plus length is greater than the length of data.");
    }

But unfortunately it is not possible because byte[] and string has only one common interface IEnumerable<T>. Use IEnumerable<T> and get length via Count() method is not an option due to performance consideration.


@pharring commented on Fri Mar 27 2015

A non-generic overload of CheckArguments would solve this, of course.


@stas-sultanov commented on Fri Mar 27 2015

of course it would, and that it is the way it is made now. :) but it dose not looks like a good design.

maybe example is not good enough, however i strongly believe that this small change would made coreclr more versatile.


@SamuelEnglard commented on Sun Mar 29 2015

Just a quick point but unless I'm wrong array's don't implement IReadOnlyList<T> so you'd still have to use the Enumerable.Count(). Do access the length/count of both an array and string in an efficient manner we'd have to Enumerable.Count() smarter. It already looks for types that implement ICollection or ICollection<T> and uses the Count property instead of looping. If we add a shortcut for IReadOnlyList<T> to that too then you could do what you want.


@stas-sultanov commented on Mon Mar 30 2015

Arrays do implement IReadOnlyList<T>, please take a look at comments of Array class. Or try to compile followin code:

IReadOnlyList<int> test = new [] {1, 2, 3};

the Enumerable class that implements Count method is not part of the coreclr.


@JohanLarsson commented on Sun Mar 29 2015

Not much perf diff between .Count() and .Length for an array IReadOnlyList is a nice interface though.


@SamuelEnglard commented on Sun Mar 29 2015

Ah yes, I had looked before posting but missed that comment. My bad. So then yes I'm all for making string implement IReadOnlyList<T>. Sorry about that


@stas-sultanov commented on Mon Mar 30 2015

Not much perf diff between .Count() and .Length for an array

indeed, but once again:

the Enumerable class that implements Count method is not part of the coreclr.


@VladimirGnenniy commented on Mon Mar 30 2015

I believe that this would be nice improvement of the architecture of the .net classes.


@terrajobst commented on Tue Apr 07 2015

@KrzysztofCwalina How do you think about this? It seems that String would eventually support Span<char>, would this additional interface make sense in general?


@Lukazoid commented on Fri Jun 19 2015

+1 for this, would be really useful to be able to pass strings directly as IReadOnlyList/IReadOnlyCollection parameters instead of going via .ToCharArray() or using a wrapper class.


@KrzysztofCwalina commented on Fri Jun 19 2015

To me string is a primitive type. I think it's already a mistake that it implements IEnumerable.


@sharwell commented on Fri Jun 19 2015

@KrzysztofCwalina I totally get that view. It leads to special cases in certain code that isn't required in, say, Java. But for better or worse, that ship has sailed. Given that string already implements IEnumerable<char> (and that isn't going to change), the question is whether or not it should also implement IReadOnlyList<char>.

:thought_balloon: I think it makes sense specifically for some cases where code currently needs to call ToCharArray(). I couldn't find any clear-cut cases where the new implementation would simplify my own existing code, but especially given it already implements IEnumerable<char> I think it makes sense to implement IReadOnlyList<char> as well.


@KrzysztofCwalina commented on Fri Jun 19 2015

If we implemented it explicitly and made sure the Linq extensions don't show on String, then I guess it would be fine.


@sharwell commented on Fri Jun 19 2015

@KrzysztofCwalina It already implements IEnumerable<char>, so the LINQ extensions already appear.


@KrzysztofCwalina commented on Fri Jun 19 2015

Hmmm, I was under the impression that we special cased it in C#.


@KrzysztofCwalina commented on Fri Jun 19 2015

@jaredpar, @MadsTorgersen, I thought C# was special cased to not show Linq methods on String?


@sharwell commented on Fri Jun 19 2015

In the PCL, string only implements IEnumerable (not IEnumerable<char>), so only OfType and Cast appear. When I target .NET 4.5+ desktop, I see the LINQ methods appearing in VS 2015 RC.


@MadsTorgersen commented on Fri Jun 19 2015

@KrzysztofCwalina I think you just spent too much time with PCL. :-) In the full framework, LINQ always worked over string - for better or worse.


@KrzysztofCwalina commented on Fri Jun 19 2015

Yeah, it's all slowly coming to me. I think we initially wanted to special case it, but then we just decided to implement IEnumerable. Then in 8.1 we added IEnumerable (why?), and we are back to the original problem that string intellisense shows all these linq methods that probably should not be used on strings.


@danmosemsft commented on Thu Dec 08 2016

API request -- moving to CoreFX

joperezr commented 7 years ago

@KrzysztofCwalina @terrajobst This is kind of an old issue that was ported from coreclr, but do we still want to consider implementing IReadOnlyList<char> on String? I'm marking this as ready for review so that we take a further look.

joperezr commented 7 years ago

cc: @stas-sultanov (logged the original issue in coreclr)

karelz commented 7 years ago

@joperezr is there any downside to implementing IReadOnlyList<char> on string? The issue title says "instead of IEnumerable<char>" - that would be a breaking change we can't take right?

shmuelie commented 7 years ago

@karelz IReadOnlyList<T> implements IEnumerable<T> so not a breaking change, no?

JonHanna commented 7 years ago

The issue title says "instead of IEnumerable" - that would be a breaking change we can't take right?

It would be an impossible change too, since IReadOnlyList<char> derives from IEnumerable<char>, though maybe they were thinking about how at the code level you don't need IEnumerable<char> to appear in the source any more once you've added IReadOnlyList<char>.

karelz commented 7 years ago

Updating the title. @joperezr can you please add the formal API review (API shape, motivation, usage). It will be easier to review.

pharring commented 7 years ago

Since this is up for review, I'd also like to suggest adding a marker interface to System.String called System.IString that may be used to identify "string-like" things. To tie in with this proposal, I believe System.IString should derive from System.Collections.Generic.IReadOnlyList<char> and System.String would implement System.IString rather than IReadOnlyList<char> directly.

@joperezr, @KrzysztofCwalina What do you think?

If we had this when we built Roslyn, we would have used IString pretty much everywhere where we currently use string, both internally and in the public APIs. It would have allowed us to eliminate many string allocations that were forced upon us because we had to eagerly evaluate substring, concat and encoding conversions. I imagine this would be the case for any text processing library. Indeed, once you have this marker interface, you'll probably want to provide new overloads on many existing FX classes (e.g. serialization, encoding, formatting).

karelz commented 7 years ago

@pharring is there benefit to tie it to this proposal? Shouldn't it be separate proposal?

karelz commented 7 years ago

@joperezr can you please update the API proposal?

pharring commented 7 years ago

@karelz Yes, they could be separate proposals. I was being a bit lazy by piggy-backing on this one.

Really, I want System.IString to act like an easier-to-type-and-remember alias for System.Collections.Generic.IReadOnlyCollection<char>. If System.IString was introduced on its own, then it would need a few methods from IReadOnlyCollection<char>.

I'll open a new issue to get some feedback before making a formal proposal.

rvhuang commented 7 years ago

I support this proposal (to implement IReadOnlyList(T)).

Also it is also possible to simply implement ICollection(T) and mark property IsReadOnly as true because it could bring benefit when users use Enumerable(T) extensions such as Count() on string.

joperezr commented 7 years ago

Here is the formal proposal from this thread:

public sealed partial class String : System.Collections.Generic.IEnumerable<char>, System.Collections.IEnumerable, System.IComparable, System.IComparable<string>, System.IConvertible, System.IEquatable<string>, System.ICloneable, System.Collections.Generic.IReadOnlyList<char>
{
   //...
   //IReadOnlyList<char> members:
   int Count { get; }
   char this[int index] { get; }

   public IEnumerator<char> GetEnumerator() { }
  // ...
}

The original proposal was to use IReadOnlyList instead of IEnumerable, but that is not possible since it would break compatibility, so the proposal is instead to just add the new interface implementation.

joperezr commented 7 years ago

The above proposal was just derived from the discussion of this thread, so if folks are happy with that we can go ahead and mark this as ready-for-review.

pharring commented 7 years ago

@joperezr Don't you also need to implement int Count { get; } ?

joperezr commented 7 years ago

@pharring you are right, I updated the proposal above.

jnm2 commented 7 years ago

The original proposal was to use IReadOnlyList instead of IEnumerable, but that is not possible since it would break compatibility, so the proposal is instead to just add the new interface implementation.

Like @JonHanna said: It's not impossible due to compat reasons, it's impossible because IReadOnlyList<char> includes IEnumerable<char>. Specifying IEnumerable<char> in the list is redundant.

AmadeusW commented 7 years ago

I'm following this issue and just wanted to share a blog post on Ropes (wiki) that I just saw. Ropes are a data structure that might be used to implement strings in disjoint memory locations, which we could use in dotnet/corefx#16786

jamesqo commented 7 years ago

What is the status of this proposal? @karelz, it looks like @joperezr has added a formal proposal here, maybe this is ready-for-review?

karelz commented 7 years ago

If @joperezr thinks it is ready, please update top post with final proposal and mark it api-ready-for-review.

danmoseley commented 6 years ago

@joperezr ?

terrajobst commented 6 years ago

Video

String isn't (logically) a collection of chars, so giving a read-only view seems off. Considering how far we're with the new span based APIs, I'm not convinced this interface is adding a ton of value for code that cares about performance (and code that doesn't has alternatives today).

karelz commented 6 years ago

@pharring are you ok with this resolution? Is Span & friends sufficient "workaround" for your scenarios?

karelz commented 6 years ago

FYI: The API review discussion was recorded - see https://youtu.be/BI3iXFT8H7E?t=4973 (15 min duration)

pharring commented 6 years ago

@terrajobst If it's not a collection of chars, then how would you describe it? (I used your words. My own description in dotnet/corefx#14572 is more precise.)

@karelz I've been playing with ReadOnlySpan<char> recently and, while it has its own limitations (requires latest C# compiler support; different behavior in .NET Core versus Desktop framework) it certainly solves the 'substring' problem.

However, it ReadOnlySpan<char> doesn't help avoid eager construction of concat or decoding.

In any case, dotnet/corefx#14572 is still open and we can move the discussion there.

pharring commented 6 years ago

@terrajobst Just watched the video. In it you say that Roslyn wanted this to work over TextBuffer and we solved that with SourceText. That's roughly correct, but the use cases for an abstraction over string are a lot broader than that. Consider all the APIs that construct readable names over syntax nodes or symbols. To construct a fully-qualified name, you have to pull together fragments of strings from various sources: metadata - which is probably encoded as UTF8 - other symbols (namespaces, types) - and source (SourceText - also may be encoded). Then you have to apply the rules for forming type names which are concatenations of these pieces with separator chars (VB and C# are different). And often you do all that so that the caller can compare the value with some other string (where knowing that the lengths are different is enough to short-circuit the comparison) and then throw the value away. It's tons of gen-0 garbage. ReadOnlySpan<char> isn't appropriate for those cases.