dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
15.26k stars 4.73k forks source link

Span<T> #6168

Closed jkotas closed 4 years ago

jkotas commented 8 years ago

Turn https://github.com/dotnet/corefxlab/tree/master/src/System.Slices into efficient first class feature.

stephentoub commented 8 years ago
adamsitnik commented 8 years ago

I just love it, hope for some up-for-grabs issues!

benaadams commented 8 years ago

Does "Add Span to framework contracts" include items like adding to Stream Read/Write etc? Or is it making Span available as a type? (Assuming latter, and others will be new issues after)

omariom commented 8 years ago

The CoreFxLab's prototype has useful extensions methods for Span<byte> and Span<char> like Read/Write and StartsWith/EndsWith. Will CoreCLR's Span get them?

dotnetchris commented 8 years ago

Would there be any sane method to allow Span<> to integrate with List<> that it could operate on the underlying array?

KrzysztofCwalina commented 8 years ago

@omariom, we need the corfxlab API and the corlcr one to be the same (portable). The reason for it is that I think we need to use the corfxlab implementation for previously shipped runtime/frameworks, e.g. .NET Framework 4.5. Otherwise, the addoption of Span will be very much inhibited by inability to write portable code. Of course, the out-of-band span will be slower, but as long as the API is available in portable code, we should be good.

KrzysztofCwalina commented 8 years ago

@benaadams, we need some existing APIs, e.g. Stream.Read/Write to accept spans, but I think it's a separate workitem form the one @jkotas listed.

KrzysztofCwalina commented 8 years ago

@dotnetchris, what do you consider "sane" in this context? If you mean wrapping Span into List then no (Span is stack only, and does not always have "underlying array").

KrzysztofCwalina commented 8 years ago

@adamsitnik, I am not sure we want completely open up for grabs so early (as there is lots of tricky work here) but you have done lots of good work in corfxlab and so if you (presonally) would like to take dotnet/coreclr#5857, it would be great! Same for @omariom :-)

adamsitnik commented 8 years ago

@KrzysztofCwalina Thanks! consider it done ;)

dotnetchris commented 8 years ago

@KrzysztofCwalina i meant for Span to access List's internal array

KrzysztofCwalina commented 8 years ago

I think we might also need to constrain T to primitive type (i.e. type with no managed object references). Otherwise, it will be possible to create Span over native memory without GC being aware of the roots.

omariom commented 8 years ago

@KrzysztofCwalina Only certain operations seem unsafe in this sense: Cast, BlockEquals and Span<byte>.Read/Write.

KrzysztofCwalina commented 8 years ago

@dotnetchris, yes, it would be great to add List<T>.Slice that returns Span<T>. Same with other slicable types, e.g. String.Slice -> Span<char>.

KrzysztofCwalina commented 8 years ago

@omariom, it seems like accessors and copy operations too, e.g. var nativeSpan = new Span(nativeMemory, len); nativeSpan[0] = new object(); // will GC know that the object is rooted? will it change the native memory when the object is rooted and moves?

Similarly when an array backed span of object sis copied to native span.

omariom commented 8 years ago

@KrzysztofCwalina Then I think it is worth adding a new constraint to CLR and C# - blittable(or other name). And ability to apply it at members level. With such constraint we could have Spans of any type but some operations would be only allowed on Spans of types having no references.

jkotas commented 8 years ago

For these kind of operations, I think the blittability should be enforced via a dynamic check if needed, that the JIT can eliminate by treating it as intrinsic; or by Roslyn analyzer. I do not think it makes sense to have first class blittable constrain - it is too much complexity, for the value that it provides.

Also, the casting between different blitable Span types is pretty questionable operation. It has portability problems because of it lets you create misaligned Spans. I do not think it should be in the "safe" Span API set. (Having it in "unsafe" span API set is fine.)

jkotas commented 8 years ago
var nativeSpan = new Span(nativeMemory, len);
nativeSpan[0] = new object();
// will GC know that the object is rooted? will it change the native memory when the object is rooted and moves?

When you are doing interop with unmanaged pointers, you have to know what you are doing and you have to get it right. If you do not get it right, your program will crash in a very bad way.

In your example, the GC will not track the native memory and your program will crash in a very bad way. It is no different from if you pass wrong len to Span(void * p, int len) that will crash your program in a very bad way as well.

I know that we have historically tried to prevent some of the bad uses, but I always had mixed feelings about it.

Maybe we should have the dynamic check for blitability in the regular Span(void * p, int len), but also have extension method without any checks in "unsafe" span API set for power-users?

nietras commented 8 years ago

I think it would be really good to have a "unsafe" span API set for high perf scenarios. As long as there will be a code path without unnecessary checks in cases where you know what you are doing.

Also I hope it will still be possible to query the span whether it is based on native or managed memory and have access to native pointer or managed array/pinning depending on which memory is used.

omariom commented 8 years ago

@jkotas

If Type had ContainsReferences property treated by JIT as a constant then it would be no-op in the dangerous methods when T is blittable and throw otherwise.

For CoreLab's Span another solution should be found. May be Reflection? Though it will be slow.

nietras commented 8 years ago

For corefx one might use the solution shown here: https://github.com/AndreyAkinshin/BlittableStructs/blob/master/BlittableStructs/BlittableHelper.cs

It is probably pretty slow (haven't benchmarked it) on first call, but should then be fast perhaps even inlineable. The helper is also discussed in a blog post: http://aakinshin.net/en/blog/dotnet/blittable/

nblumhardt commented 8 years ago

Using ReadOnlySpan<char> as the span type for substrings seems to preclude the addition of a lot of interesting/useful text-specific members (intuitive ToString(), StartsWith(string), etc.). Is there room for a separate StringSpan type for this case?

jaredpar commented 8 years ago

I do not think it makes sense to have first class blittable constrain - it is too much complexity, for the value that it provides.

The ability to convert from say Span<int> to Span<byte> proved really useful in Midori. It's something I'd love to preserve with CLR as well.

The language already has this exact concept in the spec under the guise unmanaged type. Essentially a struct that transitively contains no reference types. The compiler is already making this check for other scenarios. Why not expose it as a real language feature and generic constraint?

struct Point { int x; int y }
void M<T>() where T : blittable // implies struct
M<Point>() // Okay 
M<object>() // Nope, not blittable 
benaadams commented 8 years ago

The ability to convert from say Span to Span proved really useful in Midori.

Could even have a safe convert where it checks (length * sizeof(T1)) % sizeof(T2) == 0 (might want also an unsafe convert where it doesn't - as may be pre-checking and re-slicing).

You can already have explicit layout overlapping arrays; but the length property gets confused as its based on the instantiation type (which makes sense); but the span approach could have the appropriate length for each sized datatype.

Example use reading from stream/disk/network from byte[] => Vector3[] or saving Vector3[] => byte[]

KrzysztofCwalina commented 8 years ago

@nblumhardt, the operations you listed could be extension methods over ROS, couldn't they?

nblumhardt commented 8 years ago

@KrzysztofCwalina yes, though the ergonomics of extension methods are still second-rate because of the need to add a using statement before they're discoverable.

Tooling helps, and is improving, but at present it's still easier to find members on a type than it is to find extensions. The prevalence of "remove unused usings" as an automated refactoring means using System; is not reliably present in every code file. Perhaps a minor issue.

Even if the extension method route is preferred, could the character-oriented extensions like StartWith() be included in this PR, just in case the experience of implementing them reveals any opportunities.

(Stretching a little further, would implementing and optimizing the most common string-like methods on Span<char> open the door to providing them in the future on Span<byte> with UTF-8 or user-specified encoding?)

Thanks for the reply, apologies if I missed earlier discussions around this BTW :-)

ilexp commented 8 years ago

I've been (somewhat) following development of the C# Slicing issue, but I'm having trouble understanding how exactly these two issues are related. A few questions to clear things up:

  1. Is this the CLR-part of the actual array slicing feature in C#, or considered something entirely different?
  2. As far as I followed slicing, a core feature of an array slice was that it could be treated the same as an array. Otherwise - it would be impossible to integrate with existing libraries and APIs and essentially turn into another ArraySegment<T> that exists but is barely used. Will this be possible with a Span<T>?
  3. It seems that Span<T> is currently considered stack-only. That does rule out a few use cases. Is this a temporary state or considered a final decision?

Overall, I'm in huge favor of having a mechanism (like Span<T>?) to provide differently typed, offset or truncated views on a memory block, mostly for performance reasons. However, I'm also worried that it might stop halfway to its full potential, see also this CoreCLR issue.

Sorry to interrupt - as a mostly passive observer, I might have missed some details.

jaredpar commented 8 years ago

@ilexp it's a bit confusing because the ability to have efficient slices requires both runtime and language work.

  1. Yes. The CLR portion is making the necessary runtime changes to properly handle the "ref" field used in Span<T>.
  2. No. Having Span<T> == T[] is not a goal of this particular approach. This approach is solving a slightly different problem: uniform representation for contiguous memory allocations. Whether that memory be arrays, native memory or strings. Unifying T[] and Span<T> is possible and has been explored in great detail. But it has a number of negative tradeoffs associated with it.
  3. This particular type is indeed stack only. The design includes mechanisms for heap storage though when needed (so yes you will be able to use it in async methods).
DerpMcDerp commented 8 years ago

Has unifying T[*] and Span<T> been considered and if so what are the negative trade-offs associated with that?

KrzysztofCwalina commented 8 years ago

I assume by T[*] you mean arrays? If yes, than unifying Span and T[] would make T[] slower, i.e. additional indirection when accessing array items in cases where the compiler cannot optimize the indirection out. In addition, arrays would become one word/pointer larger

jkotas commented 8 years ago

... and the unified Span would be slow and expensive too.

jaredpar commented 8 years ago

If yes, than unifying Span and T[] would make T[] slower,

Depends on which design. The design where Span<T> is a class will have the behavior you outlined. In the version where it is a struct the perf should be equivalent. The downside of course is tearing issues.

DerpMcDerp commented 8 years ago

My original question was poorly thought out. What I meant was that the Span<T> proposal seems to be along the lines of:

  1. slicing into array/strings
  2. reinterpret_casting structs as arrays e.g. struct Point3d { int x, y, x; }; int[] arr = Span.Reinterpret<int[]>(ref some_Point3d);

But what is also desired by people is the ability to reinterpret single/multidimesional arrays as a single/multidimensional arrays with custom strides/starting indices. e.g.

int[] arr = new int[100];
var sub_arr = Span.Slice(arr, 10, 10); // reinterpret arr as a 10x10 matrix

int [,] arr2 = new int[10, 10];
var arr2.Column(0); // get a column vector. This is a slice with a stride of sizeof(int[10]) between elements

var arr3 = Span.Reversed(arr); // This is a slice with a stride -sizeof(int) to simulate reversal without allocating

The original slice proposal is biased towards 0 indexing, adjacent elements, and single-dimensional arrays. It would be nice if the slicing proposal could be slightly tweaked to support the above kind of use cases.

ilexp commented 8 years ago

Depends on which design. The design where Span is a class will have the behavior you outlined. In the version where it is a struct the perf should be equivalent. The downside of course is tearing issues.

From a performance-oriented perspective, I'd be against any solution that would make T[] slower. That just seems like it isn't worth it. However, if there is a (struct-based) approach without negative performance implications, why explore that direction in a prototype?

As far as tearing issues go, can you explain in a few words what's behind that? If these issues only occur in certain cases, is it possible to solve it by implementing a validation when creating the affected Span<T>?

jaredpar commented 8 years ago

@ilexp

As far as tearing issues go, can you explain in a few words what's behind that?

Imagine for a second that Span<T> was defined roughly as the following:

struct Span<T> { 
  object DataPointer;
  int Length;
  int Index;
}

This is a multi-word struct hence assignment of Span<T> is not guaranteed to be atomic. This means that when a field of Span<T> is assigned on one thread, it's possible to view only part of the assignment on another thread. For instance the other thread may see the change in DataPointer but not Length or Index.

For user defined types this can create odd combinations of values but doesn't threaten the safety of the system. For types like Span<T> though this is not the case. The consistency of the data is important to maintaining type safety. Consider this example:

// Thread1:
sharedObject.Field = (new int[] {}).AsSpan();

// Thread2 
sharedObject.Field = (new int[] {1, 2, 3}).AsSpan();
Console.WriteLine(sharedObject.Field[1]);

Since assignments of Span<T> are not atomic it's possible for the read of sharedObject.Field[1] to see parts of the value assigned in Thread1 and parts of the value assigned in Thread2. That means it could potentially see: DataPointer from Thread1, and Length + Index from Thread2.

In that case the bounds check would succeed because 1 <= 2 but the pointer being read from has a real length of 0. The user is now reading (or worse writing) random memory in the system. Type safety is gone.

ilexp commented 8 years ago

@jaredpar Ah, that's bad. Hypothetically, are there ways to make the assignment of Span<T> an atomic operation? Does the CLR have an atomic-assign instruction, or a way to JIT certain assignments in a way that they are essentially atomic?

mikedn commented 8 years ago

Does the CLR have an atomic-assign instruction, or a way to JIT certain assignments in a way that they are essentially atomic?

CLR more or less offers whatever the hardware offers and the hardware doesn't offer a lot when the data exceeds the machine word size (8 bytes in the case of x64). x64 has cmpxchg16b which may be useful here but it may also turn out to be quite expensive.

And it's not only the assignment itself that is a problem. When you're calling a member of a struct you're passing the address of the struct value as this, you're not making a copy/assignment. The called Span<T> method would need to make an atomic copy of itself and use that copy. That means that not only normal field assignments are potentially slow but so are all uses of a Span<T> field.

omariom commented 8 years ago

@jaredpar @jkotas

The language already has this exact concept in the spec under the guise unmanaged type. Essentially a struct that transitively contains no reference types.

So the language has it, as well as the runtime. Let's just unleash the beast to the public! )

jfrijters commented 8 years ago

Add Span and ReadOnlySpan as yet another ByRef-like type in Roslyn, so that unsupported operations on Span like boxing or creating Span[] are compile errors

Instead of the hardcoded list hack, having an attribute for this is long overdue.

omariom commented 8 years ago

It is already decided that Span is on stack thing because of torn writes issues. But do torn writes completely forbid non atomic structs - like slices or delegate as struct - to be located on heap? What if the field is readonly?

class Foo
{
    private readonly Span<byte> bytes;
}
omariom commented 8 years ago

Wrote and then realized this can be assigned to a visible location right in the ctor. Or ctors can be inlined and their code can be reordered with the publication of the object ref :disappointed:

benaadams commented 8 years ago

How does a stack only based Span<T> work with auto heap promotion?

e.g. async - its stack but on the heap (would probably be ok) closure - its a lambda but it captures (would probably be bad)

Can they work with async?

KrzysztofCwalina commented 8 years ago

It does not work well with these. Async code will have to hold to a reference type "request context", and only get spans from such context. An example of such context is HttpRequest in https://github.com/dotnet/corefxlab/tree/master/demos/LowAllocationWebServer

benaadams commented 8 years ago

Sort of serialize and re-hydrate the span? Could that be built into the async machinery in someway as async shouldn't see torn-writes as its "acting" as sync stack code. Might be something for roslyn?

KrzysztofCwalina commented 8 years ago

I don't think we can allow "serializing" ("boxing") spans. The conversion is one way: from context to span. And so in all prototypes I developed, I ended up with the following flow:

  1. Server rents buffer from a pool (can be lazy)
  2. Server wraps the buffer into "request context"
  3. The context is passed to user code to process the request.
  4. If user code needs to get access to the request bytes, it gets a span from the context. The context never gives raw buffer (e.g. array) to user code.
  5. If user code needs to suspend the thread (for async call), it stores the context for later when it's woken up.
  6. When the request it completed (i.e. response sent), the server returns the buffer to the pool, knowing that user code cannot be holding to it as the server only ever gave spans (stack only) to user code.
jaredpar commented 8 years ago

@benaadams the implementation of compiler features async, await, lambdas, etc ... use heap storage for lifted values. This causes issues with Span<T> as it can't be stored in the heap directly due to the use of interior pointers and tearing issues.

It's possible for the compiler to work around this by using a different type for heap storage. Essentially think of HeapSpan<T> which removed the interior pointer and didn't have tearing issues. The compiler could do translation under the hood to make Span<T> work with async.

This only works though if there is a way to translate between Span<T> and HeapSpan<T>. Such a translation has to solve a couple of problems:

  1. Determine if the span memory is native or managed.
  2. In the case of managed find the head of the object to which the interior pointer refers to.

Once both of these is understood it's easy to construct a safe HeapSpan<T> type which looks essentially like the following:

class HeapSpan<T> { 
  object Data;  // Head of object if managed, null if native memory
  IntPtr Offset; // Offset from Data where Span<T> starts
  int Length;
}

Unfortunately getting the above information is quite tricky and the viability of the solution depends on the implementation of Span<T>. There are a couple to consider:

Span (ref T, int)

In this case the design of Span<T> is essentially:

struct Span<T> { 
  ref T Data;
  int Length;
}

Determining if this object points to native memory is practically impossible. Sure it's possible with a lot of very slow heap scanning to determine at a specific point in time if the memory is managed or native. But the moment that determination is made it can be invalidated by actions on other threads.

Finding the head of Data is similarly expensive. The result is more stable because the result refers to a managed object.

These taken together mean that this design really can't have a safe heap translation.

Span (object, ref T, int)

In this case the design of Span<T> is essentially:

struct Span<T> { 
  object Head;
  ref T Data;
  int Length;
}

This type works by using Head to answer both of the pertinent questions. In the case of managed memory Head points to the object for which the interior pointer exists. In the case of native memory itsnull. Hence it's trivially to convert this implementation toHeapSpan` safely.

There are a few downsides of HeapSpan<T> to consider:

  • Generally requires a fatter Span<T> type.
  • The implementation trick can bleed into presentation in some corner case language scenarios.
  • Makes 100% safe native allocations over Span<T> impossible. Consumer can always store your native allocation in the heap.
benaadams commented 8 years ago

When the request it completed (i.e. response sent), the server returns the buffer to the pool, knowing that user code cannot be holding to it as the server only ever gave spans (stack only) to user code.

This is an interesting twist :+1:

It would mean though that a span couldn't be passed into an async function (like a ref can't)?

omariom commented 8 years ago

Can we somehow have Span fields but only in unsafe context? Like pointers..

hmm2

KrzysztofCwalina commented 8 years ago

@benaadams, correct. In code I wrote, the thing that flows across threads is a reference type "context". The context holds to buffers, but never hands the buffers to separate components, but rather wraps them into spans first. The code that works on spans is synchronous.

KrzysztofCwalina commented 8 years ago

@omariom, we have toyed with the idea, but it would require complex work in the runtime, compilers, language syntax, etc. I wonder if we cannot start without it (and live with the stack-only hard limitations) and then add this feature later when the limitations prove too large.