dotnet / linker

388 stars 126 forks source link

Move all "Should method be marked in the current analysis state" logic to one place #3090

Open jtschuster opened 1 year ago

jtschuster commented 1 year ago

In https://github.com/dotnet/linker/pull/3073, we coalesced a lot of logic around "Should this method be marked because it's needed by a (non-interface) base method" into one method, and enumerated the state that influences if a method is marked (i.e. the method's type is marked, the method's type is instantiated, etc.) so that we can make sure the method is called whenever the relevant state changes. We should expand the scope of that method so that we can call a single method any time the relevant state for the method changes. This will hopefully help us reason about what's going on and help debugging, but also will enable us to use the DependencyAnalysisFramework once we know everything that influences if a method should be marked.

My initial thought is that the highest level function (ex. ShouldMethodBeMarked()) would take a method and all the state information that influences whether a method is marked, with default null values. Then the caller provides only the state information that has changed and ShouldMethodBeMarked will only call the other methods representing reasons why a method would be marked (i.e. ShouldBeMarkedForBaseMethod, ShouldBeMarkedForInterfaceMethod, ShouldBeMarkedDueToX) if they depend on the state provided by the caller. For example:

public ShouldMethodBeMarked(
  MethodDefinition method,
  bool? declaringTypeIsMarkedInstatiated = null,
  bool? declaringTypeIsMarkedRelevantToVariantCasting = null,
  InterfaceImplementation? newlyMarkedInterfaceImplementationOnDeclaringType = null,
  MethodDefinition? newlyMarkedBaseType = null
  ...)
{
    if (newlyMarkedBaseType is not null) {
        if (ShouldMarkOverrideDueToBaseType(method, newlyMarkedBaseType) {
            MarkMethod (method, reason);
        }
    }
    ...
}

Tasks:

vitek-karas commented 1 year ago

Honestly this might be the time to bite the bullet and start using the dependency analysis from ilc/r2r. Lot of the problems we faced in this code recently are because:

In the dependency analysis this is basically a relatively simple conditional dependency. The code is written following the logical rules since they directly map to the dependencies. And then the dependency analysis framework takes care of "Waiting" for the final information state, without rerunning the algorithms over and over.

That said, I don't know how much work it would be and if the complexity goes down:

jtschuster commented 1 year ago

Honestly this might be the time to bite the bullet and start using the dependency analysis from ilc/r2r.

I feel the same way, I think we've reached a point where performance and maintainability has become an issue that is impacting customers, and the dependency analysis seems like the best solution.

sbomer commented 1 year ago

Agreed, and I think @jtschuster's changes have been moving us in the right direction. I would personally do it cleaning up as much of the logic as we can with the current system, then starting to introduce our own "engine". It would match the structure of NativeAot but can probably be a lot simpler - this would make it easier to integrate it at first.

I think all we really need is:

Trying this will force us to turn all of the MarkStep logic into something more like NativeAot, and I believe it shouldn't require us to use new node wrappers or anything like that. Once that all works we could consider actually reusing the NativeAot code, but I think the bulk of the work will be in fixing up the MarkStep logic to work with such a framework.

vitek-karas commented 1 year ago

There are two ways to go about this:

I'm not sure about pros/cons of either yet...

sbomer commented 1 year ago

I'm leaning towards the second option. Reasoning:

But I can also see advantages to the first - using an existing framework means less potential for bugs, and might get us sharing code sooner.

Ultimately I think either approach is fine, and the hard part will be mapping our MarkStep logic to some kind of conditional dependencies.

vitek-karas commented 1 year ago

My preference of the first approach is based on the fact that currently all of the cases which would map onto conditional dependencies are in effect one of additions to the MarkStep. They do not participate in the normal operations of the step, instead we add a new queue/hash/list for every such case and then hook it up to get "revisited" every time we're done with the method queue. My thinking was that we could take one such case (for example the overrides, but we can start with simpler things if we wanted), remove the custom hash/list and replace it with the dependency framework. We would still hookup the dependency framework at the end of method queue processing. But it would allow incremental changes to migrate over to the dependency framework.

Personally I don't see much value in writing our own simplified version of the conditional dependencies framework - especially if we all agree that ideally we would move to the framework used by ilc/r2r.

But I'm not hardcore about this in any way :wink:

jtschuster commented 1 year ago

I'm leaning towards the first option as well. I think regardless we'll probably have a period where we're using the dependency analysis framework for part of the code and not for another part. I'm not very familiar with the framework, but it seems like we might be able to transition isolated analysis pieces one at a time (like the interface overrides), without it being too awkward. I also worry we might prepare the code for the transition is a way that has a small incompatibility with the dependency analysis framework that adds more work compared to starting with it and learning how to best use the framework in the linker as we are actively using it.

But again, I've never really interacted with the dependency analysis framework, so the assumptions I'm basing this off of might not be valid.

vitek-karas commented 1 year ago

After discussion with @marek-safar on this topic I'm even more undecided. But I'm also more confident in what/why we need to change things.

Currently almost all dependencies between things are expressed in code - mostly by method calls (MarkMethod calls into MarkType to mark the declaring type of the method). It's not always direct calls as we delay processing of some of these (the method queue, the overrides and so on), but ultimately it's all code dependencies.

This makes it really difficult to reason about dependencies, since there's no unified way to represent them. This shows for example in the complexity of passing around the DependencyInfo which tries to attach some data to the dependencies.

If the dependencies were expressed as data they would be easier to reason about in general. So things like DependencyInfo could be applied globally. We could also more easily reason about the size/perf of things.

But I think the most enticing reason to represent dependencies as data is that's how I think about these. When I think about how marking works I actually think as if MarkMethod creates a dependency on MarkType - turning the methods into "entities". Having these as data structures feels more natural.

So I think we should change toward that model - represent the marking in terms of data structures (most likely a directed graph, but it could be something else as well). The code we have today then turns into event handlers which act on the data when the state of the data changes. We can do this with or without the dependency analysis framework from AOT. The framework would force certain structure on us, but we get working implementation. Or we could choose our own structure and implement it ourselves.

Still thinking it through...