Closed danmoseley closed 4 years ago
@KrzysztofCwalina @terrajobst This is kind of an old issue that was ported from coreclr, but do we still want to consider implementing IReadOnlyList<char>
on String
? I'm marking this as ready for review so that we take a further look.
cc: @stas-sultanov (logged the original issue in coreclr)
@joperezr is there any downside to implementing IReadOnlyList<char>
on string
?
The issue title says "instead of IEnumerable<char>
" - that would be a breaking change we can't take right?
@karelz IReadOnlyList<T>
implements IEnumerable<T>
so not a breaking change, no?
The issue title says "instead of IEnumerable
" - that would be a breaking change we can't take right?
It would be an impossible change too, since IReadOnlyList<char>
derives from IEnumerable<char>
, though maybe they were thinking about how at the code level you don't need IEnumerable<char>
to appear in the source any more once you've added IReadOnlyList<char>
.
Updating the title. @joperezr can you please add the formal API review (API shape, motivation, usage). It will be easier to review.
Since this is up for review, I'd also like to suggest adding a marker interface to System.String
called System.IString
that may be used to identify "string-like" things. To tie in with this proposal, I believe System.IString
should derive from System.Collections.Generic.IReadOnlyList<char>
and System.String
would implement System.IString
rather than IReadOnlyList<char>
directly.
@joperezr, @KrzysztofCwalina What do you think?
If we had this when we built Roslyn, we would have used IString
pretty much everywhere where we currently use string
, both internally and in the public APIs. It would have allowed us to eliminate many string allocations that were forced upon us because we had to eagerly evaluate substring, concat and encoding conversions. I imagine this would be the case for any text processing library. Indeed, once you have this marker interface, you'll probably want to provide new overloads on many existing FX classes (e.g. serialization, encoding, formatting).
@pharring is there benefit to tie it to this proposal? Shouldn't it be separate proposal?
@joperezr can you please update the API proposal?
@karelz Yes, they could be separate proposals. I was being a bit lazy by piggy-backing on this one.
Really, I want System.IString
to act like an easier-to-type-and-remember alias for System.Collections.Generic.IReadOnlyCollection<char>
. If System.IString
was introduced on its own, then it would need a few methods from IReadOnlyCollection<char>
.
I'll open a new issue to get some feedback before making a formal proposal.
I support this proposal (to implement IReadOnlyList(T)).
Also it is also possible to simply implement ICollection(T) and mark property IsReadOnly as true because it could bring benefit when users use Enumerable(T) extensions such as Count() on string.
Here is the formal proposal from this thread:
public sealed partial class String : System.Collections.Generic.IEnumerable<char>, System.Collections.IEnumerable, System.IComparable, System.IComparable<string>, System.IConvertible, System.IEquatable<string>, System.ICloneable, System.Collections.Generic.IReadOnlyList<char>
{
//...
//IReadOnlyList<char> members:
int Count { get; }
char this[int index] { get; }
public IEnumerator<char> GetEnumerator() { }
// ...
}
The original proposal was to use IReadOnlyList instead of IEnumerable, but that is not possible since it would break compatibility, so the proposal is instead to just add the new interface implementation.
The above proposal was just derived from the discussion of this thread, so if folks are happy with that we can go ahead and mark this as ready-for-review.
@joperezr Don't you also need to implement int Count { get; }
?
@pharring you are right, I updated the proposal above.
The original proposal was to use IReadOnlyList instead of IEnumerable, but that is not possible since it would break compatibility, so the proposal is instead to just add the new interface implementation.
Like @JonHanna said: It's not impossible due to compat reasons, it's impossible because IReadOnlyList<char>
includes IEnumerable<char>
. Specifying IEnumerable<char>
in the list is redundant.
I'm following this issue and just wanted to share a blog post on Ropes (wiki) that I just saw. Ropes are a data structure that might be used to implement strings in disjoint memory locations, which we could use in dotnet/corefx#16786
What is the status of this proposal? @karelz, it looks like @joperezr has added a formal proposal here, maybe this is ready-for-review?
If @joperezr thinks it is ready, please update top post with final proposal and mark it api-ready-for-review.
@joperezr ?
String isn't (logically) a collection of chars, so giving a read-only view seems off. Considering how far we're with the new span based APIs, I'm not convinced this interface is adding a ton of value for code that cares about performance (and code that doesn't has alternatives today).
@pharring are you ok with this resolution? Is Span & friends sufficient "workaround" for your scenarios?
FYI: The API review discussion was recorded - see https://youtu.be/BI3iXFT8H7E?t=4973 (15 min duration)
@terrajobst If it's not a collection of chars, then how would you describe it? (I used your words. My own description in dotnet/corefx#14572 is more precise.)
@karelz I've been playing with ReadOnlySpan<char>
recently and, while it has its own limitations (requires latest C# compiler support; different behavior in .NET Core versus Desktop framework) it certainly solves the 'substring' problem.
However, it ReadOnlySpan<char>
doesn't help avoid eager construction of concat or decoding.
In any case, dotnet/corefx#14572 is still open and we can move the discussion there.
@terrajobst Just watched the video. In it you say that Roslyn wanted this to work over TextBuffer and we solved that with SourceText. That's roughly correct, but the use cases for an abstraction over string are a lot broader than that. Consider all the APIs that construct readable names over syntax nodes or symbols. To construct a fully-qualified name, you have to pull together fragments of strings from various sources: metadata - which is probably encoded as UTF8 - other symbols (namespaces, types) - and source (SourceText - also may be encoded). Then you have to apply the rules for forming type names which are concatenations of these pieces with separator chars (VB and C# are different). And often you do all that so that the caller can compare the value with some other string (where knowing that the lengths are different is enough to short-circuit the comparison) and then throw the value away. It's tons of gen-0 garbage. ReadOnlySpan<char>
isn't appropriate for those cases.
The original proposal was to use IReadOnlyList instead of IEnumerable, but that is not possible since it would break compatibility, so the proposal is instead to just add the new interface implementation.
@stas-sultanov commented on Mon Mar 23 2015
I would like to suggest to make class String to implement IReadOnlyList{char} instead of IEnumerable{char}.
Possibly interface should be implemented explicitly because of the Length property, which is extern and hardcoded in JIT. On the other hand Length could be deprecated and Count should be used instead.
@pharring commented on Thu Mar 26 2015
@stas-sultanov Thanks for the suggestion. Can you give an example of where this would be useful?
@stas-sultanov commented on Fri Mar 27 2015
Sure. Here is an example from one of my projects: A ConvertEx class which provides a set of methods to convert data from base-n string to byte array and vice-versa. This class has a lot of methods like:
for this methods i'd like to use single method that checks arguments
But unfortunately it is not possible because byte[] and string has only one common interface
IEnumerable<T>
. UseIEnumerable<T>
and get length viaCount()
method is not an option due to performance consideration.@pharring commented on Fri Mar 27 2015
A non-generic overload of CheckArguments would solve this, of course.
@stas-sultanov commented on Fri Mar 27 2015
of course it would, and that it is the way it is made now. :) but it dose not looks like a good design.
maybe example is not good enough, however i strongly believe that this small change would made coreclr more versatile.
@SamuelEnglard commented on Sun Mar 29 2015
Just a quick point but unless I'm wrong array's don't implement
IReadOnlyList<T>
so you'd still have to use theEnumerable.Count()
. Do access the length/count of both an array and string in an efficient manner we'd have toEnumerable.Count()
smarter. It already looks for types that implementICollection
orICollection<T>
and uses theCount
property instead of looping. If we add a shortcut forIReadOnlyList<T>
to that too then you could do what you want.@stas-sultanov commented on Mon Mar 30 2015
Arrays do implement
IReadOnlyList<T>
, please take a look at comments of Array class. Or try to compile followin code:the Enumerable class that implements Count method is not part of the coreclr.
@JohanLarsson commented on Sun Mar 29 2015
Not much perf diff between .Count() and .Length for an array IReadOnlyList is a nice interface though.
@SamuelEnglard commented on Sun Mar 29 2015
Ah yes, I had looked before posting but missed that comment. My bad. So then yes I'm all for making string implement
IReadOnlyList<T>
. Sorry about that@stas-sultanov commented on Mon Mar 30 2015
indeed, but once again:
@VladimirGnenniy commented on Mon Mar 30 2015
I believe that this would be nice improvement of the architecture of the .net classes.
@terrajobst commented on Tue Apr 07 2015
@KrzysztofCwalina How do you think about this? It seems that
String
would eventually supportSpan<char>
, would this additional interface make sense in general?@Lukazoid commented on Fri Jun 19 2015
+1 for this, would be really useful to be able to pass strings directly as IReadOnlyList/IReadOnlyCollection parameters instead of going via .ToCharArray() or using a wrapper class.
@KrzysztofCwalina commented on Fri Jun 19 2015
To me string is a primitive type. I think it's already a mistake that it implements IEnumerable.
@sharwell commented on Fri Jun 19 2015
@KrzysztofCwalina I totally get that view. It leads to special cases in certain code that isn't required in, say, Java. But for better or worse, that ship has sailed. Given that string already implements
IEnumerable<char>
(and that isn't going to change), the question is whether or not it should also implementIReadOnlyList<char>
.:thought_balloon: I think it makes sense specifically for some cases where code currently needs to call
ToCharArray()
. I couldn't find any clear-cut cases where the new implementation would simplify my own existing code, but especially given it already implementsIEnumerable<char>
I think it makes sense to implementIReadOnlyList<char>
as well.@KrzysztofCwalina commented on Fri Jun 19 2015
If we implemented it explicitly and made sure the Linq extensions don't show on String, then I guess it would be fine.
@sharwell commented on Fri Jun 19 2015
@KrzysztofCwalina It already implements
IEnumerable<char>
, so the LINQ extensions already appear.@KrzysztofCwalina commented on Fri Jun 19 2015
Hmmm, I was under the impression that we special cased it in C#.
@KrzysztofCwalina commented on Fri Jun 19 2015
@jaredpar, @MadsTorgersen, I thought C# was special cased to not show Linq methods on String?
@sharwell commented on Fri Jun 19 2015
In the PCL,
string
only implementsIEnumerable
(notIEnumerable<char>
), so onlyOfType
andCast
appear. When I target .NET 4.5+ desktop, I see the LINQ methods appearing in VS 2015 RC.@MadsTorgersen commented on Fri Jun 19 2015
@KrzysztofCwalina I think you just spent too much time with PCL. :-) In the full framework, LINQ always worked over string - for better or worse.
@KrzysztofCwalina commented on Fri Jun 19 2015
Yeah, it's all slowly coming to me. I think we initially wanted to special case it, but then we just decided to implement IEnumerable. Then in 8.1 we added IEnumerable (why?), and we are back to the original problem that string intellisense shows all these linq methods that probably should not be used on strings.
@danmosemsft commented on Thu Dec 08 2016
API request -- moving to CoreFX