Implement PGO infrastructure

tmat commented 6 years ago

The current process is mostly manual, tedious and error prone. We need an automated infrastructure that builds PGO data regularly.

Proposal

Create a new repo (dotnet/pgo) that will host training scenarios across Roslyn, VS, ASP.NET and .NET Core and its Jenkins machine will run them regularly. This repo would be co-owned by all teams who want to contribute training scenarios. The training server would run the latest public VS on which the scenarios can be trained.

Orchestrated build triggers a training run in this repo regularly (fire and forget) - could be every build, or once a day, etc. Better more often to discover potential failures early.

Once a training run is finished a NuGet package is published automatically that contains PGO data. The package doesn't need to be published after every build, perhaps once a week would suffice.

CoreFX would pick up the PGO data from the feed where the training server publishes packages. Picking up new PGO data should be as easy as updating the package version (since we want reproducible build we wouldn't pick the latest automatically).

tmat commented 6 years ago

@KevinH-MS @jasonmalinowski @weshaggard @mmitche @markwilkie

weshaggard commented 6 years ago

cc @valenis

jasonmalinowski commented 6 years ago

@tmat The division already has a system for doing what you describe (at least up to the creation of the NuGet package). We were planning on just using that. Any reason we can't just do that?

(our better hope was once we have that, we can also get rid of having to embed PGO data into assemblies which they're looking at allowing too, which means for VS RPS tests you wouldn't have to be doing a thing.)

jasonmalinowski commented 6 years ago

I guess maybe one obvious question I'm missing: for training for VS scenarios we want to use the VS system. Is there training that needs to happen for other systems too that need to be merged?

tmat commented 6 years ago

Any reason we can't just do that?

Where is the system?

@jkotas is telling me that PGO data must be embedded into assemblies for Destkop FX. Does VS have a different mechanism than NGEN?

tmat commented 6 years ago

Chatted with @jasonmalinowski offline:

VS has a system that can be used to train binaries and produce PGO. We should use that system at least for training any binaries that get inserted into VS.
- We might possibly need another system for training for ASP.NET and dotnet CLI scenarios and perhaps merge PGOs produced by each of these systems.
There might have been experiments to enable PGO that are not embedded in the binaries (I assume for non-GAC assemblies only). We'll follow up on that.
Not having to embed PGO in binaries would be beneficial for multiple reasons:
- We can train binaries inserted into VS on scenarios that matter to VS and don't need to "taint" them by running say ASP.NET Server scenarios. Each system can train for their own scenarios.
- Infrastructure gets simpler as the training and consumption of PGO data can happen entirely within VS build. No need for coordination across multiple systems.

markwilkie commented 6 years ago

@maririos - what would you like done w/ this issue?

tmat commented 6 years ago

@markwilkie Some work is in progress in VS. This will take a while to complete.

maririos commented 6 years ago

We actually have the dotnet/optimization repo where we have the training scenarios for CoreCLR and CoreFX, but the idea is to keep building on top of that.

@tmat with recent reorg we now have perf champs for .NET. Contacts are @brianrob @billwert

FYI @adiaaida @DrewScoggins @shawnro

billwert commented 6 years ago

@tmat is this about PGO (C++ compiler) or IBC (managed optimization) or both?

tmat commented 6 years ago

@billwert IBC.

Training for Core CLR and Core FX is not good enough. Roslyn and some CoreFX assemblies (System.Reflection.Metadata and System.Collections.Immutable) need to be trained for Visual Studio scenarios.

The biggest problem right now is that the training data are embedded into the assemblies as resources. That means a single assembly can't use different training data in different scenarios and also all training needs to be centralized, which makes the system very complex.

I believe we have two options 1) If Desktop CLR supports reading IBC data from a separate file (not just from resources) we need to switch to that feature. 2) If Desktop CLR does not support that then we need to have a process where VS insertion takes System.Reflection.Metadata and System.Collections.Immutable without IBC data, runs VS training scenarios, rewrites the assemblies to embed the resources and inserts updated versions. That means the binaries shipping with VS would be different from the ones shipping in .NET Core, but I think that's ok.

This system needs to be completely automated and part of VS build.

How is this related to Arcade? Multiple repos using Arcade insert into VS (they build VSIXes). So this should be part of that VSIX building infrastructure that's already provided by Arcade. It's an opt-in feature since it does not definitely apply to repos that have nothing to do with VS.

tmat commented 6 years ago

/cc @jmarolf

brianrob commented 6 years ago

I read two different requests in this issue:

An automated workflow for generating data.
The ability to use different data for different scenarios.

I believe that both of these can be done with technologies that already exist, but it would be good to make sure I'm not missing any details.

As @maririos points out above, we have the dotnet/optimization repo which implements the training infrastructure that we use for .NET Core. Does daily training runs, produces data that gets packaged into NuGet packages, and then those packages get consumed by the repos via an auto-PR generated by Maestro. If you want to automate training of Roslyn on .NET Core, I recommend that you use this infrastructure to do so. If there are open source/closed source issues with VS we can figure out the right way forward here.

In terms of the ability to use different optimization data for different scenarios - this is possible today. The key is to decide when to embed the data into the IL image. I believe that System.Reflection.Metadata and System.Collections.Immutable do this as part of the CoreFX build, and so you don't get an opportunity to decide after that which data you want (Desktop vs. Core). The decision has already been made.

If there is a desire to have separate data it's just a matter of applying the right data to the right copy of the assembly and then keeping track of which is which. This would be a CoreFX specific implementation that would be run during the CoreFX build. Folks like @weshaggard can probably comment on how this can be done.

jkotas commented 6 years ago

Desktop CLR supports reading IBC data from a separate file (not just from resources)

Desktop CLR does not support this. (This support can be enabled by undocumented unsupported environment variable. The environment does not kick in when the compilation happens in the background, so it is not really usable for this.)

tmat commented 6 years ago

@jkotas Thanks for checking. Then we need to go with option [2].

brianrob commented 6 years ago

Instead of option [2], I think that we should figure out how to ship the binaries out of CoreFX with the proper set of optimization data. This is an issue for all assemblies that can be used against the desktop framework and against .NET Core, and so if we can solve this at the CoreFX level then it gets solved for all of them, not just the ones that get inserted into VS.

tmat commented 6 years ago

@brianrob I don't think that's actually desirable. We need to train these assemblies with VS specific scenarios. We might actually want to optimize these for 2 different sets of scenarios and produce two different binaries - one for compiler server and the other for the IDE. Instead of merging all training data together.

brianrob commented 6 years ago

Have we shown that this makes a material difference vs. merging the data? Doing something like this means that you effectively have two binaries and that the test cost for such a thing doubles.

I can see why you'd want to have different data for binaries that target different platforms, but targeting different scenarios on the same platform with different data is effectively building two separate products that need to be tested separately.

tmat commented 6 years ago

@brianrob Having 2 different training data sets for VS is just an option we might want to explore.

What we definitely need is a process within VS build that automatically produces training data for Core FX binaries we insert into VS. We can't depend on Core FX build to embed IBC data that we produce in VS build. That's too complicated. We also can't take Core FX binaries with IBC data that do not include VS scenarios. That for sure causes significant perf regressions.

tmat commented 6 years ago

Re testing cost: I don't think the testing cost doubles. The IL of the binaries is the same, so we can assume that all unit tests will produce the same results, modulo NGEN bugs. The likelihood of hitting an NGEN bug is pretty low compared to all other bugs in VS :). And we would still run our integration and manual tests on the real assemblies.

brianrob commented 6 years ago

@tmat, before we get too far into the implementation details, I'd like to re-ask one of my questions from above. Do we have data that shows that we'll see a material difference when separating data for the targeted Roslyn scenarios vs. merging all of the data together and applying to just one copy of the binaries? Also, assuming that we do see a difference, how much of a difference do we see? We should use this data to drive discussions around what we want to do and how to prioritize it relative to other things.

I still have my concerns around multiple copies of a single binary, but there are options that we can consider to mitigate those concerns assuming the business value is high enough (e.g. rename the binaries for each scenario so that there are no collisions).

tmat commented 6 years ago

@brianrob Feel free to s+ a meeting. I think it would be better discussed F2F

markwilkie commented 6 years ago

@maririos - any chance you can triage this?

maririos commented 6 years ago

@markwilkie sure

brianrob commented 6 years ago

@tmat, apologies for the delay. I've sent you an S+ for this.

In the meantime, I am going to close this issue as the issue here around how we partition training data across assemblies isn't something that belongs in the arcade repo.

dotnet / arcade

Implement PGO infrastructure #99

Proposal