ArrayElementReference<T> variant of the new Span<T> / Memory<T>

dotnet / corefxlab

This repo is for experimentation and exploring new ideas that may or may not make it into the main corefx repo.

MIT License

1.46k stars 345 forks source link

ArrayElementReference<T> variant of the new Span<T> / Memory<T> #2417

Closed verelpode closed 3 years ago

verelpode commented 6 years ago

Hello, here's an interesting idea.
The new "Span" and "Memory" in C# 7.2 could potentially be used to solve the problem of the garbage collection cost of creating a very large quantity of objects. An app could manage GC cost by allocating objects in groups, that is arrays. We can already make an array of structs, but it is rather limited because we cannot make a field that contains a reference to one of these structs in an array. So, what if Memory is used to make such a reference?

Memory internally contains _object, _index, _length, but if it is reference to a single struct in an array, then _length == 1. I suggest making a variation of Memory like this:

  public readonly struct ArrayElementReference<T>
  {
     private readonly T[] _array;
     private readonly int _index;
  }

or like this:

  public readonly struct ArrayElementReference<T>
  {
     private readonly T[] _array;
     private readonly System.UIntPtr _offsetInBytes;
  }

Likewise, Span contains _pointer and _length and in this case, we don't need _length because it is always 1. Thus make a corresponding variation of Span like this:

  public readonly ref struct FastArrayElementReference<T>
  {
     private readonly ref T _pointer;
  }

or like this:

  public readonly ref struct FastArrayElementReference<T>
  {
     private readonly System.UIntPtr _pointer;
  }

Thus the next version of C# could give us the ability to define a field (in a class or struct) that contains a reference to a struct element in an array, and this is implemented like ArrayElementReference as shown above. When this field is copied to a local variable or method parameter on the stack, then it is converted to FastArrayElementReference as shown above -- the same idea as how Span is the fast version of Memory.

Thanks for considering it.

benaadams commented 6 years ago

Expose the special ByReference<T> runtime type that Span<T> uses?

/cc @jkotas

jkotas commented 6 years ago

ByReference<T> is a workaround for https://github.com/dotnet/csharplang/issues/1147.

verelpode commented 6 years ago

To clarify the syntactic sugar, C# would allow you to write like the following example of a kind of large tree that uses struct-based nodes allocated in arrays (instead of thousands of individual objects).

struct ExampleNode
{
    ref ExampleNode Parent;  // note ref keyword here
    ref ExampleNode LeftChild;
    ref ExampleNode RightChild;
    int SomePayload1, SomePayload2;
}

void TestMethod()
{
    ExampleNode[] page1 = new ExampleNode[20000];
    ref ExampleNode nodeA = page1[0];  // note ref keyword here
    ref ExampleNode nodeB = page1[1];
    ref ExampleNode nodeC = page1[2];

    nodeA.LeftChild = nodeB;
    nodeA.RightChild = nodeC;
    nodeB.Parent = nodeA;
    nodeC.Parent = nodeA;
}

For the internal implementation, the C# compiler would translate the above syntactic sugar to the following:

struct ExampleNode
{
    ArrayElementReference<ExampleNode> Parent;  // or Memory<ExampleNode>
    ArrayElementReference<ExampleNode> LeftChild;
    ArrayElementReference<ExampleNode> RightChild;
    int SomePayload1, SomePayload2;
}

void TestMethod()
{
    ExampleNode[] page1 = new ExampleNode[20000];
    FastArrayElementReference<ExampleNode> nodeA = page1[0];  // or Span<ExampleNode>
    FastArrayElementReference<ExampleNode> nodeB = page1[1];
    FastArrayElementReference<ExampleNode> nodeC = page1[2];

    nodeA.LeftChild = nodeB;
    nodeA.RightChild = nodeC;
    nodeB.Parent = nodeA;
    nodeC.Parent = nodeA;
}

Ideally, the Span/FastArrayElementReference optimization would be implemented, but if this optimization is impossible, then the proposed syntactic sugar would still be quite useful even if it only uses Memory/ArrayElementReference without ever being optimized like Span. If it is unoptimized, then the C# compiler generates:

struct ExampleNode
{
    ArrayElementReference<ExampleNode> Parent;  // or Memory<ExampleNode>
    ArrayElementReference<ExampleNode> LeftChild;
    ArrayElementReference<ExampleNode> RightChild;
    int SomePayload1, SomePayload2;
}

void TestMethod()
{
    ExampleNode[] page1 = new ExampleNode[20000];
    ArrayElementReference<ExampleNode> nodeA = page1[0];  // or Memory<ExampleNode>
    ArrayElementReference<ExampleNode> nodeB = page1[1];
    ArrayElementReference<ExampleNode> nodeC = page1[2];

    nodeA._array[nodeA._index].LeftChild = nodeB;
    nodeA._array[nodeA._index].RightChild = nodeC;
    nodeB._array[nodeB._index].Parent = nodeA;
    nodeC._array[nodeC._index].Parent = nodeA;
}

I said "unoptimized" but actually the above "unoptimized" version is still much faster than garbage-collecting 20 thousand individual objects (when ExampleNode is class instead of struct). Thus it's interesting to note that the proposed syntactic sugar is still very good even if it isn't optimized as well as the Memory + Span combo. And if it is optimized as well as Span, then I think it would be a killer new feature.

Thanks jkotas for linking to dotnet/csharplang#1147. Unity is a good example because they want to avoid game frame rate stalls caused by GC, and they do this by using the unsafe keyword, but understandably they want to use safe code. I've proposed a solution that eliminates the unsafe code without triggering the problem of frame rate stalls / high GC cost.

verelpode commented 6 years ago

@benaadams -- I think you already know what I'm about to write, so this message is actually for other readers. ByReference<T> alone would be insufficient. Two special structs (ArrayElementReference and FastArrayElementReference) are required for the same reason why Memory<T> and Span<T> cannot be merged together into a single struct definition.

However, if the "unoptimized" (actually the less optimized) version of my proposal is implemented, then only Memory/ArrayElementReference<T> is needed, and Span/ByReference/FastArrayElementReference<T> is unused. If my proposal is implemented using only one special struct, then that struct must be like Memory<T> not like Span/ByReference<T>, but ideally (if possible) my proposal would be optimized as well as the Memory<T> + Span<T> combo meaning 2 special structs.

It is important to understand that in the fully-optimized scenario, the C# compiler generates different IL code for the same syntax "ref ExampleNode x" when...

when x is a field in a struct or class in the heap, versus
when x is a local variable in a method, a variable on the stack.

However if the less optimized solution is implemented, then x as field is the same as x as local variable, meaning akin to Memory<T> in both places.

jaredpar commented 6 years ago

One thing to consider whenever we discuss allowing for explicit ref fields in structs is this part of the span safety rules:

https://github.com/dotnet/csharplang/blob/master/proposals/csharp-7.2/span-safety.md#length-one-span-over-ref-values

Essentially the span safety rules were designed on the idea that a struct could never contain a ref field. If that can happen then the compiler needs to consider any ref parameter as potentially escaping by-ref to any other ref struct parameter or return.

This has a fairly devastating effoct on ref struct APIs

ref struct S { 
  internal void M(ref int p) { ... }
}

At this point the compiler must consider that the p parameter can escape by-ref into this. Hence at the callsite both must have the exact same lifetime.:

void Method(ref S s) {
  int i = 42;
  s.M(ref i); // Error!!! 
}

That example may seem a bit contrived but it's essentially every Read / Write API that we have on ref struct. Overall it makes the system unusable. As a result we ended up adding this language to the spec in order to make these APIs doable.

If we wanted to add ref fields in the future we'd need to account for it by doing one of the following:

Safety rules would need to distinguish between ref struct that have ref fields and those that don't.
Add some notation for disallowing certain parameters / values from being captured. Essentially a way to mark p above as "don't allow capture by ref".

These are all doable but it's work and needs quite a bit of working out.

CC @JeremyKuhne as I know he's interested in (1) above.

jaredpar commented 6 years ago

@jkotas

ByReference is a workaround for dotnet/csharplang#1147.

C# also doesn't allow ref fields because there really isn't a way to define them in IL. If we did them it would likely be via emitting ByReference<T>.

jkotas commented 6 years ago

C# also doesn't allow ref fields because there really isn't a way to define them in IL.

This feature would work on new runtimes only. We can choose how to make this feature work on the new runtimes. I think relaxing the restriction on byref fields in IL would be the most natural design.

jaredpar commented 6 years ago

This feature would work on new runtimes only.

Definitely. If we added support for ref fields then I think it has to be CoreClr only due to the GC tracking issues. The desktop GC wouldn't treat such fields as a strong reference and hence allowing it there would open up fun GC holes. Correct?

I think relaxing the restriction on byref fields in IL would be the most natural design.

What would happen to Span<T> then? Would it's implementation just be changed to be essentially:

ref struct Span<T> { 
  ref T Data;
  int Length;
}

jkotas commented 6 years ago

The desktop GC wouldn't treat such fields as a strong reference

It would not get this far. The existing desktop runtimes would not be able to load these types with either design. ByReference<T> does not exist in the existing desktop runtimes and the typeloader would refuse to load ELEMENT_TYPE_BYREF.

Would it's implementation just be changed to be essentially

Right.

JeremyKuhne commented 6 years ago

CC @JeremyKuhne as I know he's interested in (1) above.

I'm more interested in (2) actually. :)

Add some notation for disallowing certain parameters / values from being captured. Essentially a way to mark p above as "don't allow capture by ref".

For me the driving scenario is passing Span<byte> buffer = stackalloc byte[64] to ref struct methods so they can manipulate/access the parameter without risk of capturing it.

jaredpar commented 6 years ago

@JeremyKuhne

I'm more interested in (2) actually. :)

Yes of course 😦. This is the downside of using the "all 1." scheme for numbered lists.

verelpode commented 6 years ago

What would happen to Span then? Would it's implementation just be changed to be essentially: ref struct Span { ref T Data; int Length; }

That makes good sense in my mind, because, honestly, from my perspective, although Span and Memory are great, personally I view references-to-structs as being fundamentally more important and more intrinsic than spans/ranges/extents/substrings, therefore I think it makes good sense if the REAL/core feature is ref T x fields and Span<T> becomes only a thin extension to ref fields that merely adds the length, and the real magic would be done in the implementation of ref T x fields and not in (or for) Span<T>. This design appears to be a clean and logical layering of functionality.

Considering that references are such a very fundamental thing, it also makes good sense in my mind that ref fields would be emitted as byref fields in IL directly, not emitted as ByReference<T> or other special struct, but if necessary, emitting a special struct is also a workable solution. Ideally I'd favor having IL directly support this feature, but I'm not the expert in CLR.

I think it would be an impressively big win for C# if 20000 instances can be garbage-collected with the same cost as 1 object (or rather 1 array).

For another example of potential wins, have a look at System.Collections.Generic.SortedSet<T> and SortedDictionary<TKey,TValue> and you can see they contain an internal Node class:

internal class Node {
    public bool IsRed;
    public T Item;
    public Node Left;
    public Node Right;
}

This Node class could be changed to a struct:

internal struct Node {
    public bool IsRed;
    public T Item;
    public ref Node Left;
    public ref Node Right;
}

Even if SortedSet and SortedDictionary don't bother adjusting the page/array length dynamically, even if they simply use a constant page length of 10, the number of objects would be reduced to one-tenth of the current implementation with class Node.

verelpode commented 6 years ago

For another example, have a look at the internal struct MS.Internal.Xml.Cache.XPathNode. Although it is already optimized to a struct instead of class, imagine how much easier, simpler, and cleaner it would have been to write XPathNode if C# allowed XPathNode to contain ref XPathNode x; fields. I can think of numerous examples that would benefit from this feature.

jkotas commented 6 years ago

C# if 20000 instances can be garbage-collected with the same cost as 1 object (or rather 1 array)

The garbage collection cost is proportional to the bytes allocate and collected. The number of objects matters much less.

System.Collections.Generic.SortedSet<T> and SortedDictionary<TKey,TValue> MS.Internal.Xml.Cache.XPathNode

The ref fields would be allowed in ref-like types only. I do not think converting the structs to ref-like struct in these two examples would work in practice.

verelpode commented 6 years ago

The ref fields would be allowed in ref-like types only. I do not think converting the structs to ref-like struct in these two examples would work in practice.

I agree, converting those two examples to stack-only/ref-like structs would not work, but in my proposal, I meant that a NORMAL struct and a normal class would be able to contain a field that contains a reference to some other struct instance stored in an array. My proposal is already possible to do today in the current C#, except that:

The syntax is cumbersome. It would be better with syntactic sugar and/or direct support in IL.
It could potentially be optimized further, but this optimization is not mandatory.

It already works without use of ref struct, ByReference<T>, and Span<T>. It doesn't need any magic structs or special new IL, except if you want to optimize it. For example, consider the SortedSet<T>.ReplaceChildOfNodeOrRoot method. Following I've rewritten this method to use struct Node instead of class Node. The following is how it looks with regular C#, without any new syntactic sugar. As you can see, it already works but the syntax is cumbersome -- it would be better with syntactic sugar and/or direct support in IL.

struct ArrayElementReference<T>
{
    readonly T[] Array;
    readonly int Index;
    public static bool operator == (ArrayElementReference<T> a, ArrayElementReference<T> b)
    {
        return a.Index == b.Index && a.Array == b.Array;
    }
} // struct ArrayElementReference<T>

class SortedSet<T>
{
    ArrayElementReference<Node> root;

    struct Node  // NORMAL struct not "ref struct".
    {
        bool IsRed;
        T Item;
        ArrayElementReference<Node> Left;
        ArrayElementReference<Node> Right;
    }

    void ReplaceChildOfNodeOrRoot(ArrayElementReference<Node> parent, ArrayElementReference<Node> child, ArrayElementReference<Node> newChild)
    {
        if (parent.Array != null)
        {
            if (parent.Array[parent.Index].Left == child)
                parent.Array[parent.Index].Left = newChild;
            else
                parent.Array[parent.Index].Right = newChild;
        }
        else
        {
            this.root = newChild;
        }
    }

    static void TestSetLeft(ArrayElementReference<Node> parent, ArrayElementReference<Node> newChild)
    {
        parent.Array[parent.Index].Left = newChild;
    }

} // class SortedSet<T>

I compiled it using the existing C# compiler and it emits the following IL for the TestSetLeft method:

.method static 
    void TestSetLeft (
        valuetype ArrayElementReference`1<valuetype Node<!T>> parent,
        valuetype ArrayElementReference`1<valuetype Node<!T>> newChild
    ) cil managed 
{
    .maxstack 8
    ldarg.0
    ldfld !0[] valuetype ArrayElementReference`1<valuetype Node<!T>>::Array
    ldarg.0
    ldfld int32 valuetype ArrayElementReference`1<valuetype Node<!T>>::Index
    ldelema valuetype Node<!T>
    ldarg.1
    stfld valuetype ArrayElementReference`1<valuetype Node<!0>> valuetype Node<!T>::Left
    ret
}

So it already works, but how about making some syntactic sugar and/or better IL? Wouldn't it be excellent if the C# compiler allowed us to write the same thing using the following simple syntax?

class SortedSet<T>
{
    ref Node root;

    struct Node  // STILL NORMAL struct not "ref struct".
    {
        bool IsRed;
        T Item;
        ref Node Left;
        ref Node Right;
    }

    void ReplaceChildOfNodeOrRoot(ref Node parent, ref Node child, ref Node newChild)
    {
        if (parent != null)
        {
            if (parent.Left == child)
                parent.Left = newChild;
            else
                parent.Right = newChild;
        }
        else
        {
            this.root = newChild;
        }
    }

    static void TestSetLeft(ref Node parent, ref Node newChild)
    {
        parent.Left = newChild;
    }

} // class SortedSet<T>

The above syntax could produce the same IL as already supported (the IL above). Alternatively, if desired, it could be optimized to produce IL similar to the following, if you're willing to extend IL to support the following "arrayelemref" or similar.

.method static 
    void TestSetLeft (
        arrayelemref Node<!T> parent,
        arrayelemref Node<!T> newChild
    ) cil managed 
{
    ldarg.0
    ldarg.1
    stfld arrayelemref Node<!T> valuetype Node<!T>::Left
    ret
}

The following IL shows the struct parameters passed by reference (IL &), but this fails because these parameters pass only a pointer/address, but the method TestSetLeft needs to know both ArrayElementReference.Array and ArrayElementReference.Index in order to set the field Node.Left because Node.Left needs to store ArrayElementReference<Node> not only a pointer/address, because struct Node is a normal struct not ref struct Node {...}.

.method static 
    void TestSetLeft (
        valuetype Node<!T>& parent,
        valuetype Node<!T>& newChild
    ) cil managed 
{
    ldarg.0
    ldarg.1
    stfld Node<!T> Node<!T>::Left   // Incorrect.
    ret
}

jkotas commented 6 years ago

    readonly T[] Array;
    readonly int Index;

This is turning one pointer into pointer + index. In practice, it will be two pointers due to alignment. So this would result into more bytes being allocated on GC heap. It is very unlikely to make anything faster.

We do not want to allow storing refs on GC heap because of they are very expensive to deal with during garbage collection. If we have allowed it and people started using them, we would have a big problem with GC pause times.

verelpode commented 6 years ago

It is very unlikely to make anything faster.

If it doesn't make anything faster, then why do Dictionary<TKey,TValue> and XPathNode and other examples do it? Dictionary<TKey,TValue>.Entry is a struct. It would be easier and simpler to write Entry as a class, but the cost would be excessive, so the authors of Dictionary wrote it as a struct. Entry.next is meant to be a reference to another instance of struct Entry, but C# or the CLR doesn't support it.

I'm not saying Dictionary should be rewritten, rather I just mean it's one example of reducing cost by using structs instead of classes, but currently it is often a headache to use struct instead of class because structs cannot contain references to each other. My proposal gives structs the power of classes but with lower cost, doesn't it?

Don't you think SortedSet<T>.Node would be lower cost if it was changed to a struct akin to what Dictionary does?

We do not want to allow storing refs on GC heap because of they are very expensive to deal with during garbage collection.

But in my proposal, most of these references are self-references -- they point to themselves. Most instances of Node.Left.Array and Node.Right.Array point to the same array object that contains the Node instance. When traversing a graph for GC purposes, if you are currently at object X1, and X1 contains a field that points to X1 (points to itself), then you don't follow or traverse into this link -- you ignore this link. Ignoring a reference isn't expensive, right? Or am I mistaken?

jkotas commented 6 years ago

If it doesn't make anything faster, then why do Dictionary<TKey,TValue> and XPathNode and other examples do it?

Right, they use indices. Indices are cheap both for storage and GC.

Ignoring a reference isn't expensive, right? Or am I mistaken?

Finding the object that the ref points to is the expensive part. This would need to be done before you can check that the ref is safe to ignore because of it points into the same array.

verelpode commented 6 years ago

Damn. Would you like the idea better or worse if ArrayElementReference.Array had a way of indicating self-reference without actually storing the address of the self-array? For example, if the address 1 or (UIntPtr)-1 was interpreted to mean self. Alternatively, when ArrayElementReference.Index is negative and ArrayElementReference.Array is null, then it could mean self.

I haven't benchmarked the above idea, but I did do a benchmark without the above idea, and my proposal was faster than class Node, but underwhelming and insufficiently compelling :-( Apparently my proposal needs to be adjusted in some way before it could be sufficiently compelling.

Hey, I just noticed this: Try compiling the following short program in VS 15.7.6 and when you run it, it throws System.TypeLoadException! Who is the best person to inform about this bug? Seems like a serious bug.

class Program
{
    static void Main(string[] args)
    {
        Test();
    }
    static SNode Test()
    {
        return new SNode();
    }
    struct ArrayElementReference<T>
    {
        public T[] Array;
        public int Index;
    }
    struct SNode
    {
        ArrayElementReference<SNode> x;
    }
}

Unhandled Exception: System.TypeLoadException: Could not load type 'TestConsoleApp.Program+SNode' from assembly 'TestConsoleApp, Version=1.0.0.0, Culture=neutral, PublicKeyToken=null'.
   at TestConsoleApp.Program.Main(String[] args)

jkotas commented 6 years ago

Know issue: https://github.com/dotnet/coreclr/issues/7957

verelpode commented 6 years ago

On second thought, let me tell you the exact benchmark results, and you decide for yourself whether it is a worthwhile improvement. I said "underwhelming" because I was expecting my proposal to be at least 10x faster but it was 3x to 4x faster in this test. Is that worthwhile? More importantly, would it solve the problem for the Unity game engine? Can my proposal be further improved or optimized somehow?

I allocated 51_000_000 nodes. Each node contains 2 node references and 48 bytes of integers. When using class Node, it ran for 9.3 seconds. When using struct Node with my proposal, it ran for 2.5 seconds (3.7 times faster). I made no attempt to optimize self-references.

The times include the time taken to run: System.GC.Collect(System.GC.MaxGeneration, System.GCCollectionMode.Forced, blocking: true, compacting: true); In both tests, System.GC.GetTotalMemory returns approx 3891 megabytes before GC.Collect.

Would anyone like me to email the source code or upload somewhere?

verelpode commented 6 years ago

Interesting -- a significant part of the reason why my proposal is faster is revealed by System.GC.CollectionCount. After running the benchmark with class Node:

Gen 0 collected 649 times.
Gen 1 collected 338 times.
Gen 2 collected 9 times.

After running the benchmark with struct Node:

Gen 0 collected 2 times.
Gen 1 collected 2 times.
Gen 2 collected 2 times.

When I reduced the page/array length to 10_000 nodes (causing more arrays to be allocated), the time increased slightly to 2.8 seconds and the number of collections increased slightly:

Gen 0 collected 6 times.
Gen 1 collected 6 times.
Gen 2 collected 6 times.

Thus my proposal might solve the problem for the Unity game engine because my proposal causes garbage collection to run much less often. Admittedly it's less beneficial than I presumed it would be. Is it worthwhile?

verelpode commented 6 years ago

I've brainstormed a few alternative solutions that are also quite interesting to think about:

What if a class can have an attribute applied that says that the class should use automatic-reference-counting (like in .NET Native or like in the latest version of MS C++ for UWP apps) instead of the normal CLR GC?

MS C++ automatic-reference-counting still suffers the problem of cyclic references, doesn't it? Therefore it would ONLY be used for classes that explicitly request it via an attribute. This would allow people to create particular classes that don't cause garbage collections to run, without resorting to unsafe code.

Another alternative: If we could somehow allocate a group of objects inside a "GC compartment/container", and the objects inside a compartment are GC'd on an all-or-nothing basis instead of individually. Any reference to such an object causes all objects in the compartment to be kept alive. Possibly a compartment remains ineligible for GC until the app explicitly/manually marks it as eligible, such as in a IDisposable.Dispose method or in a finalizer.

What if the CLR supported an ability to allocate a read-only array of class instances where each element of the array is immediately non-null and cannot be changed to any other reference? System.Array.IsReadOnly == true. These class instances (array elements) may be GC'd on an all-or-nothing basis similar to an array of structs. This feature might only be compatible with classes that have a particular attribute applied, and perhaps such classes can ONLY be allocated in this array manner, never individually.

[System.ArrayElementClass]  // Means class is restricted to existing as an array element.
class MyTest1
{
    int TestFieldA;
    MyTest1 Parent;
}

MyTest1[] ary = new MyTest1[10000];
ary[0].TestFieldA = 123;  // Wouldn't cause NullReferenceException.
ary[0] = xxxx;   // Throws ReadOnlyException.
bool b = ary.IsReadOnly;  // Returns true.
ary[0].Parent = ary[5];   // Supported, unlike if MyTest1 was struct.
MyTest1 individualInstance = new MyTest1();  // Disallowed.

Every instance of a ArrayElementClass class would contain a CLR-internal read-only field/header that stores the array index or byte offset of the instance/element. Thus if you have a pointer to an instance of an ArrayElementClass, then you can instantly calculate a pointer to the array that contains the instance of the class. i.e. you can instantly recover the array pointer from an element pointer.

The GC wouldn't track/determine the reachability of each element/instance of ArrayElementClass, rather it would determine the reachability of the array, akin to how it treats an array of structs.

gfoidl commented 6 years ago

If we could somehow allocate a group of objects inside a "GC compartment/container", and the objects inside a compartment are GC'd on an all-or-nothing basis instead of individually