dotnet / corefxlab

This repo is for experimentation and exploring new ideas that may or may not make it into the main corefx repo.
MIT License
1.46k stars 347 forks source link

Consider renaming Span<T> to Array<T> #633

Closed KrzysztofCwalina closed 7 years ago

KrzysztofCwalina commented 8 years ago

Any thoughts on this from the community?

KrzysztofCwalina commented 8 years ago

cc: @terrajobst, @cartermp

omariom commented 8 years ago

Wow! That's a change!

So there will be abstract System.Array, ImmutableArray<T>, Array<T> and ReadOnlyArray<T>?

KrzysztofCwalina commented 8 years ago

I guess this is the implication.

For the context, I filed the issue because:

  1. People don't like the name "Span". I don't hate it but I don't particularly like it either.
  2. We cannot name the type Slice and keep the name of the slicing method "Slice" (the type system does not allow this).
  3. "Slice" is a pretty darn good name for the method.
  4. I feel strongly that all methods that slice should have the same name. We cannot have the method called "Slice" on some types and then call it something else (e.g. Subslice) on others.
cartermp commented 8 years ago

:+1:

omariom commented 8 years ago

Combined with full language and JIT support slices could (almost) completely replace usage of arrays. At least in the new code.

int[:] indexes = new int[10];

So the name looks appropriate.

mikedn commented 8 years ago

Combined with full language and JIT support slices could (almost) completely replace usage of arrays.

Indeed, if such support is added and if new language features (ref returns) are used so this type more closely resembles the classic array then Array<T> could be a winner. Otherwise I think such a name should be avoided because it would be confusing.

svick commented 8 years ago

Since I doubt arrays are ever going away, I don't think Array<T> is a good name. It would be confusing especially when speaking: how do you figure out which type is "array of ints"?

To contrast, similar situation worked out well for generic collections: it's pretty clear that "queue" generally refers to System.Collections.Generic.Queue<T> and not System.Collections.Queue, since nobody uses that.

It may be too long, but what about ArraySlice<T>?

cartermp commented 8 years ago

It would be confusing especially when speaking: how do you figure out which type is "array of ints"?

I think the end goal here is that you shouldn't really have to be asking that question in the first place.

In Go or Rust, for example, you create arrays at the top level, but the vast majority of meaningful work you do is on a slice of that array (even if the slice happens to be the whole thing). The way they are reasoned about isn't as this other thing that is backed by an array, but the chunk of the array I care about. Rust ex:

fn print_size(slice: &[i32]) {
    println!("slice has {} items", slice.len());
}

fn main() {
    let xs: [i32; 5] = [1, 2, 3, 4, 5]; // fixed-size array of 5 elements

    print_size(&xs); // prints 5
    print_size(&xs[1 .. 4]); // prints 3
}

When I write this kind of code I never ask which type is the array of ints. Although the semantics here are a bit different than typical C# code, the underlying reasoning I do doesn't involve distinguishing between types.

It may be too long, but what about ArraySlice<T>?

This is the current way that Swift handles things. Although I haven't written much Swift code, the impression I get is that the friction between Array<T> and ArraySlice<T> is pretty low. That's certainly helped by the tendency not to explicitly annotate types and [..] syntax, but even without the language-level syntax the types place nicely with one another.

VSadov commented 8 years ago

I think it would be confusing since Array is a well established data structure going beyond .Net and does not only implies indexing but also always represents a complete view on the stored data. Slice is an encapsulation of a range of indexable items, which is separate from the actual backing storage.

I think the data structure should be called Range and operation that produces Ranges, called Slice.

In most cases we will not utter the actual type (like we rarely mention System.Int32), but it would help if the name of data type would match its semantics.

int[] data = new int[100];

// slice a range of 3 to 9 elements inclusively and pass that to Foo
Foo(data[3:9]);     

// same as
int[:] range = data[3:9];
Foo(range);

// same as 
Range<int> range = data.Slice(3, 9);
Foo(range);
cartermp commented 8 years ago

@KrzysztofCwalina side note: should the title be Span<T> instead of Slice<T>?

KrzysztofCwalina commented 8 years ago

@cartermp, fixed.

mburbea commented 8 years ago

Is the restriction that a method cannot have the same name as it's enclosing type just a C# limitation or is that a CLR restriction? Just some speculating but I think it's a C# limitation as the following are legal...

class ToString{}
delegate int Invoke();
Enum A{ A}

I think slice is the best name. Can we just remove the restriction for this class?

jamesqo commented 8 years ago

What about View<T>? C++17 has array_view/string_view in its STL.

svick commented 8 years ago

@jamesqo To me, ArrayView<T> sounds okay, but View<T> is too non-specific: it could mean any number of things.

terrajobst commented 8 years ago

I agree with @jamesqo. To me ArrayView<T> seems a bit too verbose for my taste, but I don't think this is a strong enough argument against it.

VSadov commented 8 years ago

ArrayView<T> seems ok to me too. A bit verbose, but I think we do not expect to see the actual type name too often. Especially assuming there will be special syntax like T[:]

terrajobst commented 8 years ago

A bit verbose, but I think we do not expect to see the actual type name too often. Especially assuming there will be special syntax like T[:]

While that is likely true, I don't think we should rely on language features when naming APIs. As you know, there is a high chance that the language feature either doesn't happen or happens much later. I think as a best practice we should design our APIs without having to rely on language features to hide any quirkiness (not just naming). Of course, we should leverage the compiler and the languages to do things that are hard (such as creating state machines or doing reachability analysis) but not for putting lip stick on a pig.

cartermp commented 8 years ago

I would prefer ArraySlice<T> to ArrayView<T>, as this is because of languages which already use the "slice" terminology (Python, Go, Rust, Swift). The C++ name seems unique to C++17.

If we're going to make the differentiation between slices and "sliceable" collections be conceptually distinct, then keeping "slice" in the name makes it easiest to understand without having to consult an API reference.

jamesqo commented 8 years ago

@cartermp Perhaps we should name the normal version ArraySlice, and the read-only version ArrayView? Then people making a slice of a string could write ArrayView<char>, instead of something like ReadOnlyArraySlice<char> (which IMO is pretty verbose).

jamesqo commented 8 years ago

Also, maybe Range<T> would also be a good name? Then we could name the types Range<T> and View<T> (or ReadOnlyRange<T>) without conflicting with the Slice method.

KrzysztofCwalina commented 8 years ago

Range is not a bad idea. I would like a single word (and short) name for the type. Especially that we will also have to have ReadOnlyXxx version of it.

JeffreySax commented 8 years ago

My preference would be ArraySlice.

In general, it should not be assumed that the result of a slice operation is always a slice/ArraySlice:

In my view, the ideal API would use indexer properties that take as an argument a lightweight structure (Slice?) that is constructed from the slicing syntax. In addition, I'd allow extension methods with a specific name (GetSlice as in F#?) This approach has several advantages:

terrajobst commented 8 years ago

@JeffreySax

In general, it should not be assumed that the result of a slice operation is always a slice

I buy that which is why I believe having a single term is useful because we can combine it with other concepts or in other places where Xxx<T> wouldn't be appropriate, such ReadOnlyXxx<T> or StringXxx.

In the language, all you need to do is map the slicing syntax to the Slice type. (I really don't think there's a need to clutter the language with the type [:] syntax.)

Not sure what map the slicing syntax to the Slice type would mean if there is no syntax. In general, I like the idea of mapping the indexer syntax [:] to calling a Slice method (which can be provided as an instance method or an extension method). The return type could be arbitrary -- what matters is the name and shape of the Slice method. That would be akin to how the language deals with other constructs, such as foreach and await.

That being said, I wonder whether there shouldn't also be a syntax for the slice type. For instance, it would be nice to be able to declare a type using similar syntax, such as:

int[] numbers = {1, 2, 3, 4, 5}
int[:] threeFourFive = array[2:3];

string text = "Hello";
string[:] el = text[1:2];

I see two options there:

  1. Hard-wire T[:] to a particular slice type, e.g. Xxx<T>. This would prevent us from using the synatx for specialized slice types such as StringXxx.
  2. Define T[:] as the return type of the T.Slice method. Not sure the language designers like the fact that types are bound based on the shape of members.

In general, I think syntax matters a lot when it comes to how features are being perceived.

JeffreySax commented 8 years ago

I think that once you make the result of a slicing operation of arbitrary type, then a special syntax for the slice type is very close to useless and certainly not worth the trouble. Your first option is too restrictive, and in your second option it looks like T is the collection type, which is different from array syntax T[] where T is the element type.

Not sure what map the slicing syntax to the Slice type would mean if there is no syntax. In general, I like the idea of mapping the indexer syntax [:] to calling a Slice method (which can be provided as an instance method or an extension method).

I mean that the expression a:b is pure syntactic sugar and should map to a type, say SliceIndex, that is a lightweight struct. So

a[1:4] = b[2:5]

maps to

a[new SliceIndex(1,4)] = b[new SliceIndex(2,5)]

where you've defined:

public MySlice this[SliceIndex index] { get; set; }

The biggest problem when you don't use such an index type is that it doesn't work for multi-dimensional arrays. How do you differentiate between m[1,2:3] and m[1:2,3]? F# used options, which has the advantage that you can omit boundaries (v[1:] or even v[:]), but the huge drawback that options are reference types. If you want this capability - and I do - then it's much better to let the compiler insert the usual default value: 0 for lower bounds and Length/Count-1 for upper bounds.

You can also use the SliceIndex syntax in loops if you want:

foreach (var i in 0:9999) { /* ... */ }
juliusfriedman commented 8 years ago

Just to add my 0.02, Slice as it stands is nothing more than ArraySegment for the most part. IList Adapter can already do this as well as signal changes to the underlying collection. Generic.IList misses that but you can easily add it. What happens when an array is resized? What if the sliced data no longer exists? I think what people are after is a way to pass the slice to a method which takes array (without an offset and length). If that's the case Array could be modified to have a private offset and length would reflect the number of elements. It's easy to see the syntax being parsed and passed to type so that the syntax desired is achieved. Array could also be subclasses before it was sealed and the subclass could be used for slicing and whatever else essentially replacing array in runtime use behind the scenes. Not sure in the end this gains anything though performance or otherwise (Slicing) as the consumer can still just do array [-1] or otherwise. They could also just be given the offset and length as traditional...

juliusfriedman commented 8 years ago

Just as a FYI, I penned something out to deal with the Resizing issue among other things.

The result is @ In my projects source

It's not 100% complete yet as it doesn't deal with strings correctly.

I also have an idea about not even copying memory by forging an CLR array header and placing it right before the offset of the first desired element which makes the array think it has X elements which start directly after the header...

KrzysztofCwalina commented 7 years ago

Lots of good discussion on this, but I don't think we have found the perfect name, so not worth renaming.