dotnet / roslyn

The Roslyn .NET compiler provides C# and Visual Basic languages with rich code analysis APIs.
https://docs.microsoft.com/dotnet/csharp/roslyn-sdk/
MIT License
19.04k stars 4.03k forks source link

Compress size of generated async state machines #42617

Open Horusiath opened 4 years ago

Horusiath commented 4 years ago

Proposal

The idea here is to reduce the size of generated async state machine by utilizing explicit struct layout to remove fields that are no longer in use by consecutive steps of the state machine. This idea came from Rust's approach to building async state machines (see related blog post).

Example

Consider following simple example:

public class User {
    public string FirstName { get; set; }
}
interface IDatabase {
    Task<User> GetUser(int userId);   
    Task SaveUser(User user);
}

public class C {
    private readonly IDatabase db;
    public async Task<User> UpdateUser(int userId, string name) {
        var user = await db.GetUser(userId);  // from here on, `userId` is no longer used
        user.FirstName = name; // from here on, `name` is no longer used
        await db.SaveUser(user);
        return user;        
    }
}

As you can see in SharpLab, compiler will capture all of the parameters and fields used as part of the state machine for the entire duration needed by the async method to complete. The more await steps and more variables are reused between them, the bigger memory footprint of that state machine will be.

With liveness analysis it would be possible to determine which identifiers are used in which steps and reuse memory occupied by the captured variables, which liveness boudaries do not overlap. With this the total memory footprint of a state machine could be reduces from sum of all captured variables to a max of all variables used at any given time. This would bring the biggest benefit in long async methods consisting of many await steps.

Speaking in terms of example above it should be possible to change the current struct:

[StructLayout(LayoutKind.Auto)]
private struct <UpdateUser>d__1 : IAsyncStateMachine {
    public int <>1__state;
    public string name;
    public int userId;
    private User <user>5__2;
    // other state machine fields
}

into:

[StructLayout(LayoutKind.Explicit)]
private struct <UpdateUser>d__1 : IAsyncStateMachine {
    [FieldOffset(0)] public int <>1__state;
    [FieldOffset(4)] public string name;
    // live boundaries of userId and user do not overlap
    [FieldOffset(12)] public int userId;
    [FieldOffset(12)] private User <user>5__2; 
    // other state machine fields
}

Any thoughts on that?

CyrusNajmabadi commented 4 years ago
    [FieldOffset(12)] public int userId;
    [FieldOffset(12)] private User <user>5__2; 

Is this verifiable? I thought the runtime didn't allow this sort of thing. When you try the above, what happens?

YairHalberstadt commented 4 years ago

@CyrusNajmabadi It's used in dotnet/runtime. For example: https://github.com/dotnet/runtime/blob/33b3b9a3103a23b049038fc0c5b6fdef0c4c3848/src/libraries/System.Data.OleDb/src/SafeHandles.cs#L295 https://github.com/dotnet/runtime/blob/91c1d7cab87653fc2214d9effbee54d3d9072c65/src/libraries/System.Private.CoreLib/src/System/Buffers/Text/Utf8Formatter/Utf8Formatter.Guid.cs#L198

CyrusNajmabadi commented 4 years ago

Aren't those cases of other value types? I didn't think you could overlap value and reference types in this manner

YairHalberstadt commented 4 years ago

I think overlapping value and reference types would be a bad idea here, if only because you can't know what size the value types will have at runtime. But you could overlap reference types.

CyrusNajmabadi commented 4 years ago

But you could overlap reference types.

Does the runtime allow this?

davidfowl commented 4 years ago

cc @jkotas

jkotas commented 4 years ago

Note that overlapping fields like this would break debugging, so it is not just a simple local compiler change.

We know that there is a lot of opportunity to make async perform better, and it has been a topic of our internal discussions. Your suggestion is equivalent to stack packing optimization, but it would be nice to enable all other usual optimizations for async methods too: inlining, data-flow optimizations, ... .

cc @davidwrighton