dotnet / csharplang

The official repo for the design of the C# programming language
11.1k stars 1.01k forks source link

Champion "slicing" / Range (16.3, Core 3) #185

Open gafter opened 7 years ago

gafter commented 7 years ago

Introduce a .. syntax, such that x..y ends up creating a Range of x and y.

See also

Thaina commented 7 years ago

Do we still need this feature while we have return ref support?

We could just create readonly ref collection couldn't we?

struct ReadonlyRefCollection<T>
{
    T[] array;
    int slice,length;
    public ref T this[int i] { get => ref array[i] }
}
gafter commented 7 years ago

@Thaina Slicing may end up being just that, though it is possible we will want a special syntax for taking a slice, like e[a:b].

Thaina commented 7 years ago

@gafter Thanks for answer

but then I would disagree for adding syntax just for that

Extension method array.Slice(from,to);or array.Slice(from,count); would be sufficient and more familiar

chrisaut commented 7 years ago

Would the runtime be smart enough to know it can collect/free parts of eg. an underlying array that is not in a slice? In other words, if I have

var arr = new int[HUGE];
var slice = arr[500:1000];
// assume arr is no longer referenced anywhere except through slice, will it keep holding entire arr alive?

I assume there are huge complications with object headers that would make this very complicated. But it sure would be nice.

iam3yal commented 7 years ago

@chrisaut The way I read it is a slice is a view over an existing array so I think that arr must be alive in order to manipulate it; otherwise, you may need to make a copy of the slice.

iam3yal commented 7 years ago

@Thaina str[a:b] reads a lot nicer than str.Slice(a, b) but if you don't like that you could use the API directly. :)

Thaina commented 7 years ago

@eyalsk Sorry to say that I think completely opposite. It not intuitive at all to assume that array[10:100] means slice it out. It also not sure that what 100 mean count or endIndex. In C# we never use a:b to mean range of number before

On the contrary we had string.Slice to be familiar with

Thaina commented 7 years ago

@chrisaut I think it would be too hard to determined if resizing array, which involve copy and clearing memory, would yield performance when it really necessary. Too many scenario compiler need to do guesswork

such as, what if the sliced array will also dispose in the next minute? Then it no point to waste time dispose the old one anyway

Developer could just explicitly optimized by manually resize

MovGP0 commented 7 years ago

In case special syntax is needed, I suggest to use the range operator from Pascal, which is .., because it is more closely to the operator in mathematics (unicode character 'horizontal ellipsis'). So instead of e[a:b] we would write e[a..b].

The a..b could map directly to Enumerable.Range(a, b-a).

chrisaut commented 7 years ago

@eyalsk @Thaina I'm not talking about making copies, slices should absolutely under no circumstance require copying of data, that's their entire point.

I'm talking about making the runtime smart enough to understand, that a given array only has ranges x-y reachable.

An array is basically a bunch of header info, followed by a block of memory. A slice is a view over this bunch of memory, at a specific offset. If the slice's header itself contains header info such as underlying type, lengths etc., then the original array's header would no longer be needed.

If the GC understood that for arrays (and maybe other types later), before reclaiming it has to check some state somewhere to see if any slices are holing onto it, it could understand that only a small region is "alive" and free the rest.

This is just a rough idea.

Thaina commented 7 years ago

@chrisaut Slice itself never need to copy, unless it trying to do what you want, resize it

To have the mechanism you said, partially return memory to the pool, involve CLR optimization. Not about language or compiler

iam3yal commented 7 years ago

@chrisaut Where did I say that slices are about making copies? please reread what I wrote.

iam3yal commented 7 years ago

@Thaina

In C# we never use a:b to mean range of number before

There's always first time...

We didn't have many things in C# and now we have them so it isn't really an argument, if you don't find the syntax pleasant to read then that's a different story. :)

MovGP0 commented 7 years ago

I am also wondering what happens when the b in e[a:b] becomes b<a. Will the slice be counted backwards or will there be an exception?

Version 0 When b<a an exception is thrown.

int[] primes = new int[] { 2, 3, 5, 7, 9, 11, 13 };
int[:] a = primes[5:2]; // throws an exception
int[:] b = primes[1:0]; // throws an exception
int[:] c = primes[0:-1]; // throws an exception

Version 1 When b<a, it counts backward from element e[a] by b elements. This maps to primes.Slice(int from, int count).

int[] primes = new int[] { 2, 3, 5, 7, 9, 11, 13 };
int[:] a = primes[5:-3]; // A slice with elements {11, 9, 7} 
int[:] b = primes[1:-2]; // A slice with elements {3, 2} 
int[:] c = primes[0:-1]; // A slice that yields {2} and then throws an exception

Version 2 When b<a, it counts backward from element e[a] to element e[b]. This maps to primes.Slice(int from, int to).

int[] primes = new int[] { 2, 3, 5, 7, 9, 11, 13 };
int[:] a = primes[5:2]; // A slice with elements {11, 9, 7} 
int[:] b = primes[1:0]; // A slice with elements {3, 2} 
int[:] c = primes[0:-1]; // A slice that yields {2} and then throws an exception
Thaina commented 7 years ago

@eyalsk Well, that's why most of the time I would like to recycle keyword more than add new

And as I said, it's not that we can't. I just argue that it not intuitive because we never have it. Compare to what we already have, a slice function

iam3yal commented 7 years ago

@chrisaut Just to make it crystal clear, what I meant is that if you no longer need a reference to the array but you still want to have the slice around then I think that you will need to make a copy of the slice, just a hunch based on my understanding, I could be wrong.

chrisaut commented 7 years ago

@eyalsk I understand what you said to mean that for what I want, slices would need to copy. I don't think it does. Apologies if that's not what you want. I agree with you that in the normal/straight forward approach you need to copy if you no longer want the array, but only the slice. I'm actively questioning if we can make it so that that is not needed.

@Thaina yes of course it would require clr/gc work, but many new features would require that. I'd guess slices in general will need runtime support (regarless of the partial freeing) as slices are not just a new Type.

iam3yal commented 7 years ago

@Thaina async/await is intuitive? pattern matching is intuitive?

Thaina commented 7 years ago

@eyalsk Well, intuitive relative to what? I was talking slice comparing between string.Slice and that f:t syntax

Actually I would argue that many feature of pattern matching is also really not intuitive. I was propose alternative many times but I am not contributor so my voice is not important

But still many feature of pattern matching are intuitive. such as case. But if it me I would use switch instead of introducing match

But the point is many feature in pattern matching have nothing to comparing to. So add new one is not necessary to be intuitive

But async/await is good example. It is really more intuitive to write code than callback. You can write a loop intuitively just add await

I would also argue that we should not need await and should only have async keyword in Task, like yield can be used in IEnumerable without special syntax

But as I said, intuitive is relative. async/await still more intuitive than callback. But f:t is less intuitive than Slice in my opinion

iam3yal commented 7 years ago

@chrisaut

I'm actively questioning if we can make it so that that is not needed.

Just to cite the relevant part from the proposal:

As demonstrated in this code example, slicing wouldn’t make a copy of the original data; rather, it would simply create an alias for a particular region of the larger range. This allows for efficient referencing and handing around of a sub-portion of an array without necessitating inefficient copying of data. However, if a copy is required, the ToArray method of Slice could be used to forcibly introduce such a copy, which could then be stored as either an array or as a slice (since arrays implicitly convert to slices):

Therefor, I wouldn't expect any clever tricks here but I could be wrong. :)

Thaina commented 7 years ago

@eyalsk I think what @chrisaut means is.

If we have alloc memory for array, suppose from ptr0 to ptr50

Then we slice it at ptr20 to ptr30 and then stop using the original array

CLR could dealloc ptr0 to ptr18 and ptr31 to ptr50. And maybe the implementation of slicing array could force CRL to check that

But I don't think CLR can do thing like this

iam3yal commented 7 years ago

@Thaina

intuitive relative to what?

Intuitive relative to the person, many language constructs require some learning so this wouldn't be any different, especially when the person has no prior knowledge.

Actually I would argue that many feature of pattern matching is also really not intuitive. I was propose alternative many times but I am not contributor so my voice is not important

Thaina being a Contributor doesn't mean "anything" and your voice is as important as mine.

But async/await is good example. It is really more intuitive to write code than callback. You can write a loop intuitively just add await

It's straightforward to create asynchronous methods but it isn't straightforward to understand how asynchronous code works and what it means, especially for people that never did such work before.

My point being is that there's a bit of learning to do before you can use a feature and not so much about how it can be more intuitive...

Joe4evr commented 7 years ago

I am also wondering what happens when the b in e[a:b] becomes b<a.

Existing methods in the BCL that have similar behavior (such as string.Substring()) have the arguments startIndex, length. I would hope that this convention carries forward to slices as well.

Thaina commented 7 years ago

@eyalsk

Intuitive relative to the person

Not at all. Relative is objectively compare with existing things. Even leaning curve is also relative to what exist and what newly introduced

being a Contributor doesn't mean "anything" and your voice is as important as mine

Disagree. people who not being contributor might be important but I never see it will be the same importance as contributor

It's straightforward to create asynchronous methods but it isn't straightforward to understand how asynchronous code works and what it means, especially for people that never did such work before.

Still the same word, relative. If there are newbie learning C# there are callback function and async/await we can present to let them learn how to do asynchronous. async/await has less learning curve compare to callback function that not straightforward

My point is comparing learning curve with similar feature is necessary. And I would vote that we should better use array.Slice more than array[f:t] because it more intuitive and familiarity

iam3yal commented 7 years ago

@Thaina

Not at all. Relative is objectively compare with existing things. Even leaning curve is also relative to what exist and what newly introduced

And a person isn't a thing?

Disagree. people who not being contributor might be important but I never see it will be the same importance as contributor

Thaina... let's not get into the meaning of a Contributor when it's pretty well defined and has absolutely nothing with the subject.

And I would vote that we should better use array.Slice more than array[f:t] because it more intuitive and familiarity

Thaina you would still have the ability to use the API directly.

I understand that you don't like the syntax but what do you think about the syntax that was proposed by @MovGP0? personally I find the proposed syntax : pretty nice.

MovGP0 commented 7 years ago

i am totally happy with the : syntax. Just would argue that the .. syntax can be used more universally like in

foreach(var item in 3..5){ /* 3,4,5 */ }

foreach(var item in a[3..5]){ /* a[3], a[4], a[5] */ }
Thaina commented 7 years ago

@eyalsk

Thaina... let's not get into the meaning of a Contributor when it's pretty well defined and has absolutely nothing with the subject.

As always you are the one who pick the word to argue with me. So you stop yourself not telling me to stop what I don't start

Thaina you would still have the ability to use the API directly.

If that's your argument then you would never ever come mess with my threads in the past. Remember when you annoyingly mock me when I try to introduce delete keyword? Why you not just think that you could still use dictionary.Remove?

To introduce new syntax have many cost and drawback or else we would have all syntax we could imagine already. Same here that you like this syntax so you want it and I'm against it because it make code more weird and cost too much for little benefit

I know why you try to argue with me in the past even you make fun of me in every aspect and make our argument become silly. You should understand yourself too

chrisaut commented 7 years ago

Can we keep it on topic please... :-)

iam3yal commented 7 years ago

@Thaina

Remember when you annoyingly mock me when I try to introduce delete keyword? Why you not just think that you could still use dictionary.Remove?

First, let me state that I didn't mock you but disagreed with you, secondly, I think that slices are going to be pretty common to warrant a language syntax.

Thaina commented 7 years ago

@eyalsk I also disagree in this feature but I don't ever do like you have did to me. I state my reason clearly not jokingly mock anyone like

I.. I don't know what to say :)

And so many and many

Secondly. For me slicing array is also important. But not necessary to be operator. I have opinion that it value equally to string.IsNullOrEmpty. Both don't need to be operator just extension method is satisfied and more intuitive. Because it do what it said

iam3yal commented 7 years ago

@Thaina

And so many and many

I really didn't know what to say because you compared such a subtle feature to LINQ which is pretty heavyweight feature, if you will put things in context and take it in perspective you will realize that I didn't try to mock you, I was just being honest with you, however, if I hurt your feelings then I apologize, it wasn't my intention.

Secondly. For me slicing array is also important. But not necessary to be operator. I have opinion that it value equally to string.IsNullOrEmpty. Both don't need to be operator just extension method is satisfied and more intuitive. Because it do what it said

Okay, if you think that the language shouldn't have a special syntax for that then I respect your opinion.

Thaina commented 7 years ago

@eyalsk LINQ is unnecessary heavyweight. Because we can always use functional linq style from System.LINQ instead of whole unnecessary linq syntax. Comparatively it relatively lower intuitive to use linq syntax over linq functional in C#. Only people used to SQL syntax would feel it lower learning curve

And that's not the point. We have many way to express disagreement but you just choose to mock me. You just choose to make fun of the thread. You just choose to make argument became silly

And I always respect most opinion by state my disagreement with serious reasoning. I take it seriously as my respect

iam3yal commented 7 years ago

@Thaina Okay, you obviously don't want to let go and repeatedly make ad hominem attacks, I apologized to you so let's move on and stay on topic shall we?

Thaina commented 7 years ago

@eyalsk Wow. said from the one who always ruin my thread and always make fun of me that I make ad hominem

I just use the old thread as example but it's you who always pick some word to fight with me. Try to defend yourself that you never do anything wrong. You are epitome of strawman you know? You point to me that I use ad hominem when you just cannot accept that you are wrong

The one who pick my word about contributor to make the argument is you

I use my thread as example just to make counterargument to you but instead of argue why this and that difference you just pick a fight that you don't mock me instead. Who lead this thread out of topic?

In my humble opinion, it's you

If you really want to stay on topic why you defend yourself?

chrisaut commented 7 years ago

Guys....please

Thaina commented 7 years ago

@eyalsk And to remind you I never gave thumb down to our argument because I still leave you a respect I also never said you must stop because I respect your freedom But every time I wish it should be last time I would need to argue with you like this so I won't hold back. I hope that it would make you understand

DavidArno commented 7 years ago

I'm with @MovGP0 on wanting to see [a..b] syntax used here, rather than [a:b] for totally selfish reasons: I'm writing a proposal for cons support that uses [head:tail] syntax and do not want to have to change it just because @gafter "stole" it for this proposal! 😁

bbarry commented 7 years ago

I like how Perl 6 specifies these:

a..b // includes both endpoints
a^..b // excludes a
a..^b // excludes b
a^..^b // excludes both endpoints
Thaina commented 7 years ago

@MovGP0 for 3..5 I also make my own extension method

public static class Ext
{
    public IEnumerable<int> To(this int start,int end)
    {
        var sign = Math.Sign(end - start);
        while(start != end)
        {
             yield return start;
             start += sign;
        }
    }
}

foreach(var i in 3.To(5))
{
}

I am not familiar with pascal and I think newbie is also like me so a.To(b) is much more obvious

HaloFour commented 7 years ago

I wonder if said slice syntax could be considered in conjunction with a range syntax?

var slice = source[5..10];
foreach (var i in [0..100]) {
}
iam3yal commented 7 years ago

@HaloFour May be it's worth a separate proposal?

DavidArno commented 7 years ago

@bbarry,

I genuinely never thought I'd live to see someone utter the words "I like how Perl [does this thing]" 😲 😁

HaloFour commented 7 years ago

@eyalsk

Consider it done: https://github.com/dotnet/csharplang/issues/198

I throw around a number of syntax examples but I'm less concerned with the exact syntax than I am about having a consistent syntax that can be used in different areas of the language:

var slice = source[5..10];
foreach (var i in [0..100]) {
}
if (person is Student { Grade is 70..<80, Name is var name }) {
    Console.WriteLine("${name} is a C student!");
}
Thaina commented 7 years ago

Maybe this thread could be generalized into array indexing by IEnumerable<int> and return ReadonlyCollection<T>

var vectors = new Vector3[10];
IEnumerable<int> odds = new[] { 1,3,5,7,9 };
IEnumerable<int> before = Enumerable.Range(0,5);
IEnumerable<int> latter = 5.To(10);

ReadonlyCollection<Vector3> oddVects = vectors[odds];
ReadonlyCollection<Vector3> beforeVects = vectors[before];
ReadonlyCollection<Vector3> latterVects = vectors[latter];

oddVects[0].X = 1; // vectors[1].X = 1

And slicing is just @HaloFour syntax if it would

jnm2 commented 7 years ago

I vote for special syntax for slicing. We already have a special syntax for indexing. The two things are very close concepts, and it seems right for them to have a similar syntax.

MgSam commented 7 years ago

Not sure why no one from MS has mentioned this, but the CoreFX guys have a prototype of a Span class already that surely would go hand-in-hand with this proposal.

https://github.com/dotnet/corefxlab/blob/master/docs/specs/span.md

chrisaut commented 7 years ago

Span seems already in corefx proper. I see issues/pr about it there regularly.

ufcpp commented 7 years ago

Yes it's already been in corefx https://github.com/dotnet/corefx/tree/master/src/System.Memory https://dotnet.myget.org/feed/dotnet-core/package/nuget/System.Memory

Gregory-Bell commented 7 years ago

What is the advantage of having a special slicing syntax over reusing the multi-parameter indexer?

int[] primes = new int[] { 2, 3, 5, 7, 9, 11, 13 };
slice<int> a = primes[2,3];
slice<int> b = primes[,3];
slice<int> c = primes[2,];

We could even do slicing on multidimensional arrays. For an array of rank n, indexing with n+1 parameters would return a slice.

int[,] numbers = { {1, 2}, {3, 4}, {5, 6} };
slice<int> a = numbers [,,]; //A slice of the whole array
slice<int> b = numbers [,,3]; //A slice containing the first three elements 1 2 3
slice<int> c = numbers [1,1,]; //A slice containing the last three elements 4 5 6

Slicing multidimensional arrays might not be worth the added complexity seeing how little used they are. However, it is possible to do slicing while reusing the existing indexer syntax.

alexpolozov commented 7 years ago

@Gregory-Bell With your proposed syntax, we can't describe at a glance what's happening with a[i,j] without looking up the type of a. If a is T[], this is a slicing operation; if a is T[,], this is an element lookup. These operations are somewhat similar but have different semantics and different return types, and I prefer being able to distinguish them at a glance.

Moreover, if a[i,j] is an indexer, then it can be a virtual multi-parameter indexer, and now I need to know the runtime type of a to understand what's happening. The issue gets even more complicated if we allow user-defined slicing (as an operator, shape, duck-typed method, interface, whatever).

That being said, I'd love to compose indexing and slicing syntax like in Python, R, etc:

int[,] matrix = { {1, 2}, {3, 4}, {5, 6} };
Span<int> firstColumn = matrix[.., 0];  // {1, 3, 5}
Span<int> thirdRowWithoutLastElement = matrix[2, ..-1];  // {5}