CLR/JIT should optimize "alloc temporary small object" to "alloc on stack" automatically

ygc369 commented 8 years ago

If a small object's reference is never stored in heap or static variables, (only stored in local variables), it is temporary, the CLR or JIT should alloc it on stack and free it in time. Thus, GC pressure would be lower. This feature is just like escape analysis technique in JVM. I've suggested this in Roslyn forum(https://github.com/dotnet/roslyn/issues/2104), but they said I should suggest here.

category:cq theme:stack-allocation skill-level:expert cost:large

cmckinsey commented 8 years ago

Thanks, this is something the team is aware of and discussed. A specific example in a real-world workload that you can share would help us prioritize this request. We already have synthetic examples.

zpodlovics commented 8 years ago

You may be also interested with the following proposal and prototype implementation project: Stackalloc for reference types with Roslyn and CoreCLR [1] [2]

It introduce a generalized stackalloc with a transient variable modififer to solve the problem.

The transient variable modifier
Lets start by defining the concept of a transient variable:
  A transient variable can only be declared for a local method variable or parameter, and cannot be used on a ref/out parameter.
  A transient variable can only be assigned to another transient variable.
  A transient variable can receive a non transient variable as long as types matches

[1] http://xoofx.com/blog/2015/10/08/stackalloc-for-class-with-roslyn-and-coreclr/ [2] https://github.com/xoofx/StackAllocForClass

mikedn commented 8 years ago

@cmckinsey

A specific example in a real-world workload that you can share would help us prioritize this request.

This might be a chicken and egg situation. If the optimization doesn't exist people will try to twist the code to avoid allocations (see for example params support for IEnumerable) and as a result there will be fewer optimization opportunities. On the other hand if the optimization exists then perhaps people will concentrate more on actually writing code instead of figuring out how to avoid allocations.

OtherCrashOverride commented 8 years ago

It introduce a generalized stackalloc with a transient variable modififer to solve the problem.

You will need the same constraints that ValueTypes have so you may as well use a struct instead of a class. For example, you can not have class finalizers. The only difference I see is that the compiler is calling a ctor for you instead of having to call MyStruct.Initialize(); yourself.

cmckinsey commented 8 years ago

@mikedn I don't disagree, but we need some way to help assess priority as we have other opportunities were we can also improve code-quality and this would help to prioritize. We have prototyped this kind of transformation in other experimental systems. Because of the challenges in the compiler automatically finding legal opportunities, which is even more challenging in JIT environments, my experience has been that developers that really want to avoid heap allocations end up having to do manual source changes anyway, such as using ValueTypes.

@OtherCrashOverride yes, that is correct. And there are other correctness rules the compiler must observe, for example this transformation might lead to stack-overflow if done on a recursion path.

GSPP commented 8 years ago

Did the team look at the work the Hotspot JVM team did with Escape Analysis? They have production experience in a very similar setting and should be able to answer many open questions.

You should be able to run benchmarks on the JVM with Escape Analysis turned on and off. They have a command line switch. The performance differential should carry over to the CLR.

Prime use cases would be enumerators (LINQ queries) and string processing I think. These can generate tons of garbage. Also things like new FileInfo(path).Exists. Or updating immutable data structures multiple times in the same method.

mikedn commented 8 years ago

You should be able to run benchmarks on the JVM with Escape Analysis turned on and off. They have a command line switch. The performance differential should carry over to the CLR.

The situation is a bit more complicated than that. Java doesn't have value types and as such it needs allocation optimizations more than .NET needs. Also, it's interesting that the "Escape Analysis for Java" paper shows that for some benchmarks the percentage of objects allocated on the stack can exceed 70% but the execution time improvement is no more than 23%. And as far as I can tell that also includes improvements due to the elimination of locks for objects that don't escape the thread.

Prime use cases would be enumerators (LINQ queries) and string processing I think.

Of all possible use cases LINQ is the least likely to benefit from such an optimization. It's interface reliance means that it is difficult to inline methods or at least perform some interprocedural analysis. Without inlining the created enumerators are returned and it's difficult to allocate such objects on the stack. They cannot be allocated on the callee frame and anyway the callee has no way of knowing that the object doesn't escape the stack, only the caller knows that. I suppose one could imagine some sort of "return allocation context" that's passed from the caller via a hidden function parameter and allows the caller to customize the way returned objects are allocated but that's a rather crazy thing to do.

Personally I'd be happy enough if, for example, small arrays could be allocated on the stack so contortions like that described in "params support for IEnumerable" wouldn't be needed. But unfortunately the cost to achieve this might be too great.

GSPP commented 8 years ago

The Java remarks are entirely true.

Regarding LINQ, here's an idea: If the callee is known to allocate a small, bounded amount of memory (this is generally the case for enumerator allocating methods) and if no such pointer escapes according to some simple data flow tracing (also the case here) then change the base pointer for heap allocations to the stack temporarily. Like this:

Set allocation base to free spot on the stack (at the end of it).
Call allocating callee. It will now allocate everything on the stack. The allocations can be dynamic (e.g. conditional). They do not need to be statically known.
Callee returns pointer (to stack memory)
Caller restores allocation base pointer

This rather simple scheme should match a lot of functions. No need to inline.

Call targets must be known but that also does not require inlining. It requires the JIT to determine the return type (which is pretty much always known in the case of iterator methods) and apply that knowledge to devirtualize all calls. I do not need to inline OrderBy in order to determine that it returns exactly an OrderedEnumerable. On the other hand this will not work for Select because it is too sophisticated.

No need to inline but it requires analyzing quite a few functions (all that touch the memory allocated).

A second simple scheme would be to allocate all ref types on the stack if their address is never passed to any other method, returned or stored to non-locals. And then have a (reliable) way to blast all aggregates allocated on the stack to registers. This scheme would catch a lot simple helper classes like vectors, points, some strings, Utf8String's, ....

If we can get a basic implementation of both of these optimizations that should catch a lot. And btw, I think that 23% execution time improvement is a fantastic number. Depends on the workload obviously.

mikedn commented 8 years ago

Call allocating callee. It will now allocate everything on the stack. The allocations can be dynamic (e.g. conditional). They do not need to be statically known.

It can't allocate everything on the stack, some allocations may escape and the callee won't always know which ones. You really need to distinct allocation contexts - "GC allocation context" and "return allocation context". The return context is normally the same as the GC context but the caller may replace it when possible. The callee always uses the return context to allocate the returned object. The return context cannot be used to allocate any other objects, including objects that are referenced by the returned object since the caller may escape those objects.

A second simple scheme would be to allocate all ref types on the stack if their address is never passed to any other method, returned or stored to non-locals.

That's the "normal" scheme. Though it's a bit too restrictive, calling any method of an object implies passing the object address to the method (unless the method is inlined) so you wouldn't be able to do much with such objects. The compiler should analyze the callees a bit for best results, a call depth of less than 5 might suffice in many cases. For example, it should be possible to stack allocate the Stack object used in the bellow example:

void foo(BasicBlock bb) {
    var stk = new Stack<BasicBlock>();
    stk.Push(bb);
    while (bb.Count > 0) {
        bb = stk.Pop();
        // other code that calls stk.Push
    }
}

GSPP commented 8 years ago

OK, all true. I really should not specify any particular (naive) escape analysis scheme but I think it's possible to devise some rather simple scheme that kills a lot of allocations and enables developers to be more productive because they can use more helper objects.

I think a key point is that inlining does not have to occur in order to infer information about the behavior of methods. Often, escape behavior can be summarized for a method in a meaningful way ("does not store to the heap/statics at all" or "does not create a new reference from the heap/statics for argument 3"). We now don't have a code size problem, only need to deal with compile time.

mikedn commented 8 years ago

I think a key point is that inlining does not have to occur in order to infer information about the behavior of methods.

Yes, in general inlining doesn't need to occur. It's just the "return new object" case that benefits from inlining.

kostrse commented 8 years ago

@cmckinsey Here's a practical example where escape analysis could be really helpful.

Let's have a look at this snippet in F# code:

// ReferenceEscapeTest.fs
module EscapeTest

    // a:int -> b:int -> int
    let diff a b =
        let (min, max) = if a > b then (b, a) else (a, b)
        max - min

From functional programming standpoint this is a very primitive construction which utilizes a tuple for dealing with two local variables in one expression. For an F# developer those are just cheap local variables, presumably allocated on stack.

fsharpc --target:library --optimize+ ReferenceEscapeTest.fs
F# Compiler for F# 3.1 (Open Source Edition)

But when we look at the disassembled IL:

using Microsoft.FSharp.Core;
using System;

[CompilationMapping(SourceConstructFlags.Module)]
public static class EscapeTest
{
  [CompilationArgumentCounts(new int[] {1, 1})]
  public static int diff(int a, int b)
  {
    Tuple<int, int> tuple = a <= b ? new Tuple<int, int>(a, b) : new Tuple<int, int>(b, a);
    return tuple.Item2 - tuple.Item1;
  }
}

We can see that, in fact, we're unintentionally abusing heap allocation for a basic arithmetic operation.

P. S. This might (or might not) be addressed by F# compiler in the first place, I just not aware of design decisions what led to this code generation in this particular case, but my point is that this optimization on the JIT side could definitely make the difference.

SunnyWar commented 8 years ago

Example code that can benefit from this is riddled all over FxCore; E.g: HastSet.IntersectWithEnumerable and CheckUniqueAndUnfoundElements both allocate a temporary BitHelper object. List.InsertRange allocates a temporary array. ConcurrentBag new's up a SpinWait in the CanSteal method.

Also, almost all code that has a "using" statement has an object that goes out of scope can be cleaned up immediately. No need to wait for the garbage collector. just call dispose and free it.

like... using (var sr = new StreamReader(fileName)) {...}

And... try/catch clause that eat the exception, the exception can be cleanup up, no need for GC.

And... I can't count how many times I've called string.Split() on a massive amount of data. It's a shame all those arrays allocated and garbage collected unnecessarily.

Finally... As Roslyn is written in C# and needs to parse thousands of text files, I bet it creates many millions of small objects on the heap that can be collected immediately after they are tokenized and go out of scope.

To name just a few.

OtherCrashOverride commented 8 years ago

Also, almost all code that has a "using" statement has an object that goes out of scope can be cleaned up immediately. No need to wait for the garbage collector. just call dispose and free it.

using simply calls the IDisposable interface. It does not provide any guarantee about object lifetime. Inside the using block, the object could be assigned to objects/variables that would maintain its 'liveness' and its perfectly valid to do so despite it being 'disposed'.

This is the reason the GC has to run to finalize the object. It has to trace to determine if an object reference is still held anywhere. This is also the point where 'escape analysis' enters the discussion.

GC.SuppressFinalizer() in Dispose() should make object clean up more efficent by eliminating a call to the dtor() during GC.

SunnyWar commented 8 years ago

@OtherCrashOverride sure, it's possible that the object is assigned to something inside the using, and in this case I would expect the compiler to see this, be conservative, and not free it. However, I claim the 80% case is this does not happen. Now, perhaps you are talking about the case where there are nested using statements, which is pretty common. In this case, I hope the compiler would be smart enough to unwind the stack and see that after Dispose all the objects can be freed immediately.

SunnyWar commented 8 years ago

It seems to me that escape analysis and call depth can get pretty complicated. I recommend that this feature be added in a super-simple way that lays the groundwork for a more sophisticated approach later.

With that in mind it seems to me a very simple case study is String.Split. It returns an allocated array. It's used in a lot of code. The array does not contain anything that requires depth analysis. It represents a very simple example of an object that often is not used outside the scope of the function.

If the array returned by Split can be freed early, the same logic will apply to a lot of other objects.

benaadams commented 8 years ago

@SunnyWar Might be similar analysis already done for Ref returns and Locals? https://github.com/dotnet/roslyn/pull/8030

svick commented 8 years ago

@SunnyWar How could an unknown-sized array returned from a method be stack allocated? Unless Split (and SplitInternal and say InternalSplitKeepEmptyEntries) is inlined, I don't see how could that be done.

Or are you talking about some "early free from heap" approach and not "stack allocate"?

SunnyWar commented 8 years ago

@svick I'm talking about "early free from heap" though if "stack allocate" accomplishes the same thing then I'm all for it! Every time I see "new" on something simple that has a scoped lifetime I wonder to myself, "Why does this need to be garbage collected at all? Isn't it obvious to even a brain-dead compiler that it is scoped, has no references, and was passed to no fuctions (in othe words: is 100% safe to free)? Why not clean it up immediately? I honestly don't get why this is so difficult and wasn't done from the beginning.

That's my motivation.

mikedn commented 8 years ago

Why not clean it up immediately?

And how do you clean it up immediately? The GC heap doesn't have any mechanism that allows you free a single object. And adding such a mechanism isn't exactly trivial.

SunnyWar commented 8 years ago

@mikedn

The GC heap doesn't have any mechanism that allows you free a single object.

If the GC has no mechanism to free up a single object, then how does the GC clean up a single object?

{ var foo = myString.Split(','); } ... GC will eventually clean up foo, a single object, with no references, passed to no one, with no finalizer. so why not do it immediately without the GC overhead? What am I missing here?

GSPP commented 8 years ago

@SunnyWar this capability could be added. If the object to be freed is the last one in the current segment the allocation pointer can just be decremented.

There have been dynamic escape tracking schemes implemented like this. Escape analysis and tracking can use a mix of static and dynamic analysis. It also can work with dynamically sized objects and variable amounts of objects.

I think a pragmatic way to go would be to stack allocate in obvious cases. Sometimes it really is trivial to prove that no pointer escapes and that the size of all objects affected is bounded.

mikedn commented 8 years ago

GC will eventually clean up foo, a single object, with no references, passed to no one, with no finalizer

GC does not work like that, it doesn't free single objects. It basically moves around all the objects that are in use to eliminate the free space that may have accumulated between them. What you suggests requires maintaining free lists and doing that in a GC allocator is a bit of a problem. Not because it is impossible or difficult but because not having to maintain such lists is one of the few advantages a GC allocator has over a classic allocator (though the CLR GC does maintain free lists in certain cases).

SunnyWar commented 8 years ago

@mikedn Thanks for the explanation. It's enlightening. I'm less interested in how difficult it is than when it will be done. Though now I see why the discussion is about stack allocation.

ygc369 commented 8 years ago

I'm happy that so many people are interested in my idea. Please have a look at another idea:https://github.com/dotnet/coreclr/issues/555, I think it's more interesting.

ufcpp commented 7 years ago

Can this improve perfomance of foreach statement? In some cases, foreach statement causes small object allocation:

using System;
using System.Collections.Generic;

class Program
{
    static void Main()
    {
        const int N = 10000;
        var list = new List<int> { 1, 2, 3, 4, 5 };

        Console.WriteLine("total memory:");

        Console.WriteLine(GC.GetTotalMemory(false));

        for (int i = 0; i < N; i++)
        {
            // no boxing. struct List<int>.Enumerator is used for iteration.
            var s2 = 0;
            foreach (var x in list)
                s2 += x;
        }

        Console.WriteLine(GC.GetTotalMemory(false));

        for (int i = 0; i < N; i++)
        {
            // list is boxed to IEnumerable
            var s3 = ViaInterface(list);
        }

        Console.WriteLine(GC.GetTotalMemory(false));

        for (int i = 0; i < N; i++)
        {
            // list is not boxed, but List<int>.Enumerator is boxed to IEnumeartor<int>
            var s4 = ViaGenerics(list);
        }

        Console.WriteLine(GC.GetTotalMemory(false));
    }

    static int ViaInterface(IEnumerable<int> items)
    {
        var s = 0;
        foreach (var x in items)
            s += x;
        return s;
    }

    static int ViaGenerics<T>(T items)
        where T : IEnumerable<int>
    {
        var s = 0;
        foreach (var x in items)
            s += x;
        return s;
    }
}

mikedn commented 7 years ago

Can this improve perfomance of foreach statement?

Not really, this was mentioned in some posts above that discussed LINQ. Not only that it is difficult to stack-allocate returned objects but you won't even reach that far because you can't perform escape analysis on MoveNext() and Current because they're interface calls.

It may work if ViaInterface is inlined, the JIT discovers that IEnumerable<T> is in fact List<T> and then it can inline GetEnumerator but that's a lot of code to inline.

sirgru commented 7 years ago

To the question about use cases: There are certainly widely spread use cases where avoiding these small allocations can be very useful. Let's say I am doing game development in C# in Unity. In methods that run many times per second, it is very important to avoid allocations. These small garbage allocations would accumulate quickly and cause GC collection to run at some undetermined time during gameplay and cause perceptible hick-ups. This can sometimes be eased by using temporary variables, re-assigned instead of re-allocation within the game loop. However, there are still issues: sometimes other methods called are not allocation-free, and sometimes allocations happen by mistake e.g. up until recent versions every use of foreach loop would generate garbage because Unity used outdated version of Mono C# compiler link . Even with the new version there are cases of "unjust" allocations on the heap, and some features are discouraged (e.g. Linq). The worst are the strings - the immutability makes them generate garbage fairly frequently. Value types are not a solution in most cases: they don't support inheritance and this makes them quite prohibitive (personal opinion). In any case, to me this would be much more welcome than ref returns for example. For above mentioned reasons I would not be able to take advantage of this in Unity immediately, but this is something to be looked forward to in the future.

Summary: I would really like to see this happening.

lilith commented 7 years ago

There are a lot of 'params' methods in the hot path. String.Trim(), for example, is often used. It does not allocate unless it actually changes the string, yet must heap allocate an array of chars.

The situation is a bit more complicated than that. Java doesn't have value types and as such it needs allocation optimizations more than .NET needs.

APIs involving mutable structs (C# doesn't have immutable ones) are incredibly error prone. There's no clarity into what copy of a struct is being mutated by what method. structs are therefore eschewed in favor of classes in public APIs.

I think it's an oversimplification to imply .NET doesn't need allocation optimizations as much as Java does.

realvictorprm commented 6 years ago

Pardon my jumping in, but what's the status of this?

mvanassche commented 6 years ago

On this example, which I believe illustrates the issue, I get a factor of ~15 between C# and Java. Also, if you change from class to struct, C# and Java performance become similar.

`namespace ConsoleApplication1

{

class Program

{
    static void Main(string[] args)
    {
        long t = System.Environment.TickCount;
        int l = 123456;
        Test1[] ts = new Test1[l];
        for (int i = 0; i < l; ++i)
        {
            ts[i] = new Test1(i, i.ToString("X8"), new Test2(i * 123, i * 789), new Test2(i * 7, i * 9));
        }
        for (int n = 0; n < 600; n++)
        {
            for (int i = 0; i < l; ++i)
            {
                ts[i] = m1(ts[i]);
            }
        }
        System.Console.Out.WriteLine("T1: " + ((System.Environment.TickCount - t) / 1));
    }
    static Test1 m1(Test1 t1)
    {
        Test1 t2;
        Test1 t3;
        t2 = new Test1(t1.p1 + 1, t1.p2, t1.p3, t1.p4);
        if (t1.p1 % 2 == 0)
        {
            t3 = new Test1(t2.p1 * 2, t2.p2, new Test2(t2.p3.p1 + 1, t2.p3.p2 + 2), new Test2(t2.p4.p1 * 2, t2.p4.p2 - 1));
        }
        else
        {
            t3 = t2;
        }
        return new Test1(t3.p1 % 4579, t3.p2, new Test2(t3.p3.p1 % 456789, t3.p3.p2 % 789456), new Test2(t3.p4.p1, t3.p4.p2));
    }
}
public class Test1
{
    public Test1(int p1, string p2, Test2 p3, Test2 p4)
    {
        this.p1 = p1;
        this.p2 = p2;
        this.p3 = p3;
        this.p4 = p4;
    }
    public int p1;
    public string p2;
    public Test2 p3;
    public Test2 p4;
}
public class Test2
{
    public Test2(long p1, long p2)
    {
        this.p1 = p1;
        this.p2 = p2;
    }
    public long p1;
    public long p2;
}

} `

AndyAyersMS commented 6 years ago

FYI, @echesakovMSFT did some work on this a while back; figured I'd link up the parts here.

dotnet/coreclr#6653 refactored the jit to create a framework where one can plug in an escape analysis. There are placeholders for the escape analysis and for the expansion of an object allocation into a stack allocation.

Egor's fork has some implementations of the analysis and transformation.

ygc369 commented 6 years ago

@AndyAyersMS If JIT does escape analysis, would the program slow down? I think that it would be better if the compiler could do escape analysis.

AndyAyersMS commented 6 years ago

There are various tradeoffs involved in determining where and how one might do escape analysis.

The compiler (and here I presume by compiler you mean CSC) has limited visibility into other methods. In particular it often sees reference assemblies and so can't really do any sort of cross-assembly analysis.

The jit sees the actual implementations of all methods and so is able to look across assemblies. But in practice it only does this for methods that it tries to inline, and it's not clear if it is worth speculatively inlining a method just to see if it helps refine escape analysis. As you point out the jit may not have the time to do a thorough analysis. That might change with the advent of tiered jitting. However it seems quite possible that relatively simple escape analyses can get most of the cases that can be gotten and trying to get too fancy here rapidly has diminishing returns (see interprocedural alias analysis). And there may be ways for the jit to piece together interprocedural analyses, but quite often (in the absence of tiering) the jit compiles callers before callees so the benfits may only be seen once tiering is a bit more mature.

There is a spot in between that is occupied by the IL linker and it might be an interesting place to experiment with something like this too.

mikedn commented 6 years ago

There is a spot in between that is occupied by the IL linker and it might be an interesting place to experiment with something like this too.

@AndyAyersMS Maybe an IL optimizer could do interprocedural analysis and annotate (probably with an attribute) method parameters that do not escape. When the JIT compiles method A that calls method B it can simply look at the parameter annotation of method B, without having to actually analyze method B. And to ensure type safety the JIT could do its own escape analysis when compiling B and fail compilation if it turns out that B escapes parameters that have been annotated by the IL optimizer. That might be a slightly tricky part as both the JIT and the IL optimizer need to use the exact same escape algorithm, otherwise each may produce different annotations.

I suspect that could also be done only on corelib, not the entire program, and still be useful. For example, string.Format does not escape its parameters but the code is rather complex/large for the JIT to analyze it in advance when a method calls string.Format.

wanton7 commented 6 years ago

@AndyAyersMS How much would simple escape analysis help with F# and Linq allocations?

mikedn commented 6 years ago

How much would simple escape analysis help with F# and Linq allocations?

This has mentioned before. It's pretty difficult to deal with things like LINQ because there you call methods that have to allocate and return object references. Those methods have no way of knowing who their callers are and if the callers escape the returned object references or not.

realvictorprm commented 6 years ago

And what's with F#?

Am 25.01.2018 8:06 nachm. schrieb "mikedn" notifications@github.com:

How much would simple escape analysis help with F# and Linq allocations?

This has mentioned before. It's pretty difficult to deal with things like LINQ because there you call methods that have to allocate and return object references. Those methods have no way of knowing who their callers are and if the callers escape the returned object references or not.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/dotnet/coreclr/issues/1784#issuecomment-360567208, or mute the thread https://github.com/notifications/unsubscribe-auth/AYPM8NMXU1nZbVqVucOSWRxC8VpDN0YAks5tONC2gaJpZM4GP8ci .

mikedn commented 6 years ago

And what's with F#?

Well, I'm not really familiar with F# so I can't comment much on it. But the language doesn't matter too much, except perhaps the fact some language may use some coding patterns more than others.

In general, if a method allocates an object and the reference to the object does not escape then the object can be allocated on the stack. One way to escape the reference is to return it from the method.

So if you have a function that returns a tuple (System.Tuple) in F# then stack allocation isn't an option.

ygc369 commented 6 years ago

@AndyAyersMS Do you know whether the .NET Native compiler supports escape analysis? It's an AOT compiler and can see the actual implementations of all methods.

AndyAyersMS commented 6 years ago

I don't believe it does escape analysis. @davidwrighton might know more though.

wanton7 commented 6 years ago

Is there something that could be done for F# to help ease GC pressure? Like converting small Tuples and Option type to a struct. Is that something JIT or IL optimizer (if it can be used with F#) could do?

benaadams commented 6 years ago

Is there something that could be done for F# to help ease GC pressure? Like converting small Tuples and Option type to a struct

In F# 4.1 there are: Struct Tuples

// Creating a new struct tuple.
let origin = struct (0,0)

// Take struct tuples as arguments to a function and generate a new struct tuple.
let getPointFromOffset ((x,y): struct (int*int)) ((dx,dy): struct (int*int)) = 
    struct (x + dx, y + dy)

// Pattern match on a struct tuple.
let doAMatch (input: struct (int*int)) =
    match input with
    | struct (0,0) -> sprintf "The tuple is the origin!"
    | struct (_,_) -> sprintf "The tuple is NOT the origin!"

Struct Records

// Regular record type
type Vector3 = { X: float; Y: float; Z:float }

// Same record type, but now it's a struct
[<Struct>]
type StructVector3 = { X: float; Y: float; Z:float }

Struct Unions (Single Case)

// Regular Single Case Union
type EmailAddress = EmailAddress of string

// Struct version of the above
[<Struct>]
type StructEmailAddress = EmailAddress of string

wanton7 commented 6 years ago

@benaadams Maybe automatic conversion of small classes that meet certain criteria to structs is just a pipe dream :) F# Core's Option type in is still a class and it would be a breaking change to change that. Some of the F# compiler optimizations don't work for struct tuples. F# compiler can sometimes optimize normal tuples away. So it's suggested to use normal tuples and optimize allocation hotspots with struct keyword. You would also have to use struct keyword everywhere in your code and F# uses tuples a lot. Maybe this something could be fixed by F# compiler by having same optimizations for struct tuples and fproj setting that would default project to struct tuples.

I also think even simple escape analysis similar to golang's would help to allocate some of the Tuple parameters from stack if you don't return that same tuple.

But escape analysis wouldn't help if Option type is used with something like Record type fields. Option type's None case is presented by null, so current situation isn't that bad because it's only allocated when it contains a value. Maybe this is something that F# will fix with a breaking change in the future, we shall see.

realvictorprm commented 6 years ago

@wanton7 One step towards "automatic" conversion would definitely be the escape analysis.

Considering this would be accepted, what would be the first step towards an implementation? Large specs?

AndyAyersMS commented 6 years ago

Hmm, Egor's fork seems to be dead. Trying to see where the code went. @echesakov ?

echesakov commented 6 years ago

@AndyAyersMS This should work now https://github.com/echesakovMSFT/coreclr/tree/StackAllocation

AndyAyersMS commented 6 years ago

Thanks!

ygc369 commented 6 years ago

@AndyAyersMS @echesakovMSFT Do you know when we would have official escape analysis?

ygc369 commented 6 years ago

@echesakovMSFT Your stack allocation branch seems dead, why not continue working on it? So many people want it.

dotnet / runtime

CLR/JIT should optimize "alloc temporary small object" to "alloc on stack" automatically #4584