dotnet / linker

388 stars 127 forks source link

Branch removal can't remove all branches in compiler generated state machine #3087

Open vitek-karas opened 1 year ago

vitek-karas commented 1 year ago

Take for example this code:

static IEnumerable<int> TestBranchWithYieldBefore ()
{
    if (AlwaysFalse) { // Property which returns constant false
        yield return 1;
        RemovedMethod ();  // This is unreachable code and should be removed, but it's not
    } else {
        yield return 1;
        UsedMethod ();
    }
}

The problem is that the compiler generated code produces the if (AlwaysFalse) where its body is just some state machine setup because there's a yield there. And so it means that even if that branch is removed, the call to RemovedMethod is still kept because it's part of a different state in the state machine.

private bool MoveNext()
{
    switch (<>1__state)
    {
    default:
        return false;
    case 0:
        <>1__state = -1;
        if (AlwaysFalse) // This is the branch which can be optimized
        {
            <>2__current = 1; // This will be removed...
            <>1__state = 1;
            return true;
        }
        <>2__current = 1;
        <>1__state = 2;
        return true;
    case 1:
        <>1__state = -1;
        RemovedMethod();  // But this is still kept, since there's not constant branch around it
        goto IL_007a;

In order to correctly implement this, the analysis would have to understand at least basics of state machines. Detecting this in a general case is really complicated because the state is stored in a field, and so we would have to be able to prove that the field can only have a certain set of values - which is really hard in multi-threaded environments.

So in theory doable, but pretty complex.

The downside of this issue is that it's hard to diagnose what's happening for the user. If this is used in combination of feature switches then this can lead to unexpected warnings, since code which is technically hidden behind a feature switch is still marked as reachable by the linker.

marek-safar commented 1 year ago

I think it might not complicate to implement for C# state machine because compiler usually assign the field in single method body only.

Did you find that in some real code somewhere?

vitek-karas commented 1 year ago

I didn't run into this in real code yet. But it's effectively hidden behind https://github.com/dotnet/linker/pull/3088. I agree that if we special case this for C# state machines this is definitely solvable.