dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
15.44k stars 4.76k forks source link

Add "slicing" APIs to ArraySegment<T> #18007

Closed KrzysztofCwalina closed 4 years ago

KrzysztofCwalina commented 8 years ago

ArraySegment could be much more useful if it had the following additional APIs. I propose that we add these members to ArraySegment:

namespace System
{
    partial struct ArraySegment<T>
    {
        public T this[int index] { get; set;  }

        public static implicit operator ArraySegment<T>(T[] array);

        public static explicit operator T[](ArraySegment<T> segment);

        public T[] ToArray();

        public int Length { get; }

        public ArraySegment<T> Slice(int index, int count);

        public ArraySegment<T> Slice(int index);

        public void CopyTo(T[] destination);

        public void CopyTo(T[] destination, int destinationIndex);

        public void CopyTo(ArraySegment<T> destination);

        public static ArraySegment<T> Empty { get; }
    }
}
omariom commented 8 years ago

public static explicit operator T[](ArraySegment<T> segment);

Does it make a copy?

KrzysztofCwalina commented 8 years ago

@omariom, yes. this is why I made it an explicit cast. Possibly it should be removed completely.

terrajobst commented 8 years ago

The nice about a conversion operatator is that the compiler will give you an error message like:

Cannot convert ArraySegment<Bananas> to @Bananas[]. An explicit conversion exists. Are you missing a cast?

Now, one could also argue that this is the problem if people just apply the cast without realizing that there is performance/memory penalty for doing so.

public T[] CreateArray();

I think we should call it ToArray(). That matches the name Linq has and thus shadows the Linq implementation with this one, which is likely being more performant.

Other than that, the proposals look good to me.

omariom commented 8 years ago

MemoryStream has ToArray as well. Prefix To here is an indication that the method makes a copy of the underlying storage.

Good candidat for implicit operator is conversation to Span. When Slices come to CoreCLR, of course.

KrzysztofCwalina commented 8 years ago

Changed CreateArray to ToArray

justinvp commented 8 years ago

Here are all of List<T>'s CopyTo overloads:

    public void CopyTo(T[] array);

    public void CopyTo(T[] array, int arrayIndex);

    public void CopyTo(int index, T[] array, int arrayIndex, int count);

Should ArraySegment<T> match?

In other words:

  1. Rename index in the proposal to arrayIndex (assuming index is the index of the destination array in which copying begins) to match the naming used by List<T>.CopyTo (and ICollection<T>.CopyTo).
  2. Add a source index parameter to the largest overload, to match List<T>.
justinvp commented 8 years ago

Interestingly, ImmutableArray<T> uses different parameter names:

    public void CopyTo(T[] destination);

    public void CopyTo(T[] destination, int destinationIndex);

    public void CopyTo(int sourceIndex, T[] destination, int destinationIndex, int length);

Whereas ImmutableList<T> matches List<T>.

mellinoe commented 8 years ago

I like ImmutableArray's overloads and naming convention.

KrzysztofCwalina commented 8 years ago

updated CopyTo to use "destination"

svick commented 8 years ago

What is the point of adding Length, when Count already exists?

davidfowl commented 8 years ago

Are we going to add array segment to more APIs as well. Nothing takes array segment except for the web socket APIs

KrzysztofCwalina commented 8 years ago

@davidfowl, yes. I think we should/must.

karelz commented 8 years ago

@KrzysztofCwalina is it ready for API review?

terrajobst commented 8 years ago

We should remove the explicit operator as it seems weird. And Empty should be static.

The indexer needs to perform a range check; it doesn't have to perform a null check because the range check can just use the fields (offset and count). The only case to observe a null reference is when the ArraySegment<T> was created by concurrent writes to a field, which we don't think warrants slowing down the indexer with additional checks.

Also, ToArray() should always copy, except when the array is empty or null, in these cases it should return Array.Empty<T>().

namespace System
{
    partial struct ArraySegment<T>
    {
        public T this[int index] { get; set;  }

        public static implicit operator ArraySegment<T>(T[] array);

        public T[] ToArray();

        public int Length { get; }

        public ArraySegment<T> Slice(int index, int count);

        public ArraySegment<T> Slice(int index);

        public void CopyTo(T[] destination);

        public void CopyTo(T[] destination, int destinationIndex);

        public void CopyTo(T[] destination, int destinationIndex, int count);

        public void CopyTo(ArraySegment<T> destination);

        public static ArraySegment<T> Empty { get; }
    }
}
AlexRadch commented 8 years ago

Where new API should be implemented in corefx or coreclr repository?

karelz commented 8 years ago

The low-level APIs which already live in coreclr repo ('corlib') need to be done first there. All APIs should have tests and reference assemblies in corefx repo - it is tricky and not best documented process at this moment. If you want to help with something simpler, I would suggest to start with corefx-only changes.

AlexRadch commented 8 years ago

@karelz I made two simple PRs in corefx, but have issues with changing config files. Can you help me? https://github.com/dotnet/corefx/pull/13975 and https://github.com/dotnet/corefx/pull/13976

karelz commented 8 years ago

@AlexRadch the more reason to not attack coreclr+corefx changes which are more tricky.

I hope that @Priya91 and @ianhays will be able to help you with the two above. I myself still lack the experience in making those changes to be able to provide useful advice.

Priya91 commented 8 years ago

@AlexRadch From your PR, seems like you're not adding the APIs proposed in this issue. I left feedback on one of your PRs. Please keep the conversation on related issues.

AlexRadch commented 7 years ago

What difference between array slices, array spans and array segments ? I am confused :(

karelz commented 7 years ago

@Priya91 @AlexRadch is there PR for this issue? If yes, it should be linked.

Slices and spans are part of the new Slice<T> API. ArraySegment is the old API.

Priya91 commented 7 years ago

is there PR for this issue?

I'm not aware of any..

AlexRadch commented 7 years ago

@karelz Why add new methods to old API? May be better to mark ArraySegment as deprecated?

jamesqo commented 7 years ago

@AlexRadch ArraySegment will not be deprecated; there is nothing wrong with it. Span is just a new way of doing things. Using ArraySegment may be necessary if you want to get back the underlying array.

@terrajobst Regarding the proposal you have above

namespace System
{
    partial struct ArraySegment<T>
    {
        public T this[int index] { get; set;  }

        public static implicit operator ArraySegment<T>(T[] array);

        public T[] ToArray();

        public int Length { get; }

        public ArraySegment<T> Slice(int index, int count);

        public ArraySegment<T> Slice(int index);

        public void CopyTo(T[] destination);

        public void CopyTo(T[] destination, int destinationIndex);

        public void CopyTo(T[] destination, int destinationIndex, int count);

        public void CopyTo(ArraySegment<T> destination);

        public static ArraySegment<T> Empty { get; }
    }
}
jamesqo commented 7 years ago

New proposed is

namespace System
{
    public struct ArraySegment<T> : IList<T>, IReadOnlyList<T>, IEquatable<ArraySegment<T>>
    {
        public ref T this[int index] { get; }

        public static implicit operator ArraySegment<T>(T[] array);

        public T[] ToArray();

        public int Length { get; }

        public ArraySegment<T> Slice(int index, int count);

        public ArraySegment<T> Slice(int index);

        public void CopyTo(Span<T> destination);

        public static ArraySegment<T> Empty { get; }
    }
}
jamesqo commented 7 years ago

@karelz Sorry, I just noticed this has already been marked api-approved... was the above spec the final decision, or can this be reviewed again?

karelz commented 7 years ago

There is always option to change ... changing APIs before we ship them in major release is "doable".

  1. I don't have opinion on public ref T this[int index] { get; } - @terrajobst is it a pattern we want to consider?
  2. I don't think we should replace CopyTo overloads with Span overload, because ArraySegment is about arrays and their segments -- so this is the place where it makes sense to add. cc: @KrzysztofCwalina @terrajobst any push back?
  3. I don't have opinion on IEquatable<ArraySegment<T>>. Do you have usage in mind? Or are you just applying common patterns?
KrzysztofCwalina commented 7 years ago

re # 1: Span's indexer is now by ref (which was decided at an API review meeting), so we already approved this new pattern. There are trade offs here, but long term the wins are large, so I think we should go with it. re # 2: The CopyTo parameter makes sense. Spans are about arrays too. The only thing I worry about is layering. re # 3: I am not a big fan of O(n) Equals methods. Also, Span.Equals just compares the fields, not the items in the span.

jamesqo commented 7 years ago

@KrzystofCwalina

I am not a big fan of O(n) Equals methods. Also, Span.Equals just compares the fields, not the items in the span.

We are definitely not making this an O(N) Equals. ArraySegment already has a strongly-typed Equals here, which just compares the fields. I am just proposing we implement the interface. (@karelz, there are no new methods being added for the IEquatable implementation.)

karelz commented 7 years ago

@KrzysztofCwalina regarding # 2, do you mean to use rather Span overload, which means we should wait and block on Span to mature first? (and solve the layering problem of course)

@jamesqo I understand you are proposing just to implement the interface. My question is why? Anything we do should have a reason and value for developers. We should not add interfaces just because we can. So, how will user code leverage the interface? What (real-world) code patterns is it going to enable?

AlexRadch commented 7 years ago

@karelz I am working on this issue

AlexRadch commented 7 years ago

@karelz I have 3 issues with this API

  1. I suggest to replace public void CopyTo(ArraySegment<T> destination) on public void CopyTo(Span<T> destination)

public void CopyTo(Span<T> destination) can be called with Span, Array and ArraySegment parameter. Because Array and ArraySegment have implicit conversation to Span.

public void CopyTo(ArraySegment<T> destination) can be called with Array and ArraySegment parameter but not with Span, because Span can not be converted to ArraySegment.

So to support new slicing API I suggest to replace public void CopyTo(ArraySegment<T> destination) on public void CopyTo(Span<T> destination)

  1. If we replace ArraySegment<T> destination to Span<T> destination as suggested in previous issue, then I suggest do not add next three CopyTo overloads
        public void CopyTo(T[] destination);
        public void CopyTo(T[] destination, int destinationIndex);
        public void CopyTo(T[] destination, int destinationIndex, int count);

    All of them can ease coded with new slicing API

    
    segment.CopyTo(array);
    segment.CopyTo(array); // no code to change

segment.CopyTo(array, destinationIndex); segment.CopyTo(array.Slice(destinationIndex));

segment.CopyTo(index, array, destinationIndex, count); segment.Slice(index, count).CopyTo(array.Slice(destinationIndex));



New slicing API allows us do not create many overloaded methods.

3.  `public void CopyTo(T[] destination, int destinationIndex, int count)` is strange method. List and Span classes haven't such method. List has similar method `public void CopyTo(   int index,  T[] array, int arrayIndex, int count)` with additional parameter `int index`. Is it accidentally skipped?
AlexRadch commented 7 years ago

@karelz Span have public bool IsEmpty{ get; } property. Should we add such property too? public static ArraySegment<T> Empty { get; } is added.

AlexRadch commented 7 years ago

I created PR https://github.com/dotnet/coreclr/pull/8593 with changes that I suggested above.

karelz commented 7 years ago

@AlexRadch re you [1] -- you should wait on closure in the discussion first: https://github.com/dotnet/corefx/issues/10528#issuecomment-265810190

@KrzysztofCwalina can you please comment?

KrzysztofCwalina commented 7 years ago

When I said layering, I meant: Span is in System.Memory.dll, which a) is not in DotNetStandard and is not in System.Runtime.dll. ArraySegement is both in NS and S.R.dll. I think we could make it work (by adding Span to these lower level libraries, but it's complicated and I think this cleanup/PR should not block on the much larger effort (to bring Span lower).

karelz commented 7 years ago

Thanks @KrzysztofCwalina! That makes sense.

@AlexRadch your questions [1] and [2] are now answered -- we won't use Span<T>, we will go with the original proposal. Regarding your [3], we should also stick to the original proposal. The original method makes perfect sense IMO -- you copy into array starting at index with specific count. The List.CopyTo overload tries to emulate segment of List IMO, which is not needed here - we already have 'segment' of the array.

AlexRadch commented 7 years ago

@karelz ok about [1] and [2] [3] segment have offset and count so if we remove index from List.CopyTo( int index, T[] array, int arrayIndex, int count) then why we do not remove count also.

But if we will remove offset and count than we already have such method public void CopyTo(T[] destination, int destinationIndex); I do not understand half removing.

So I suggest do not add public void CopyTo(T[] destination, int destinationIndex, int count); overload with half removing.

It can be easy coded as segment.Slice(newOffset, newCount).CopyTo(destination, destinationIndex).

@KrzysztofCwalina What do you think?

karelz commented 7 years ago

I could live without that overload CopyTo(T[] destination, int destinationIndex, int count). @KrzysztofCwalina?

terrajobst commented 7 years ago

We just talked about it and concluded we should indeed remove this overload:

 public void CopyTo(T[] destination, int destinationIndex, int count);

We should leave the other CopyTo methods in, though.

karelz commented 7 years ago

@AlexRadch thanks for pointing it out. The API proposal in the top post is now updated. Feel free to implement it as defined in the top post.

AlexRadch commented 7 years ago

@karelz I found 2 issues about Empty

  1. It should be static.
  2. I think we should add IsEmpty property like Span has.
karelz commented 7 years ago

Empty fixed to static (it was some copy-paste error, it was properly listed e.g. here: https://github.com/dotnet/corefx/issues/10528#issuecomment-251477256).

Let's track IsEmpty addition as separate issue, otherwise we will never converge on the API shape. Please file it, provide API shape and motivation (IMO "Span has it", is not good enough - I would like to see more details and usage patterns).

AlexRadch commented 7 years ago

@karelz I created separate issue dotnet/corefx#14502

karelz commented 7 years ago

No activity/response for 1.5 months, unassigning the issue - it is "up for grabs" again, available for anyone to pick it up and finish it. Anyone interested?

danmoseley commented 7 years ago

@ianhays @safern I moved to Future as this API does not seem necessary for our 2.0 goals. Feel free to change if it is. You might do the same with other issues of this kind if any

karelz commented 7 years ago

@KrzysztofCwalina actually wanted this one to happen in 2.0 - he said he will do it himself if no one picks it up ...

jamesqo commented 7 years ago

@karelz Curious, when does 2.0 come out?

karelz commented 7 years ago

@jamesqo check the milestone description: https://github.com/dotnet/corefx/milestones There's link to roadmap: https://github.com/dotnet/core/blob/master/roadmap.md -- currently set to Q3 2017.

karelz commented 7 years ago

@jamesqo if you want to pick up this issue, it would be appreciated ;-)