dotnet / source-build

A repository to track efforts to produce a source tarball of the .NET Core SDK and all its components
MIT License
264 stars 129 forks source link

Optimization training data (PGO/IBC) in source-build #247

Open dagood opened 6 years ago

dagood commented 6 years ago

PGO/IBC optimization training data is used in CoreCLR and CoreFX (maybe more that I don't know about). There are two issues with using that in source-build:

  1. The tooling to generate/use the data isn't buildable from source (not OSS).
  2. Optimization data generation requires an existing product.
    • This forces the build to be (at least) two-staged: first to generate an unoptimized product so that we can generate the training data, then again to build the product using that training data.
    • This resembles the pre-built binary problem, but I don't think we need to seed the process with anything built beforehand.

Is it possible to fix (1)?

If we can't accomplish this, what impact does it have on source-build users?

Currently the optimization data packages are dummied out (for the offline build):

https://github.com/dotnet/source-build/blob/a680f3640cbbbb69795874b70370bba18748e186/tools-local/init-build.proj#L149-L152

@dotnet/source-build-contrib

dleeapho commented 6 years ago

/cc @brianrob @adiaaida

omajid commented 6 years ago

Thanks for getting this conversation started.

Is it correct to say this is about optimization performed by the native compiler. Is there something on the managed side as well?

Do you have any idea of the performance impact of optimization training data? Is it 1% to 2% or significantly more?

This forces the build to be (at least) two-staged: first to generate an unoptimized product so that we can generate the training data, then again to build the product using that training data.

Is this data platform, architecture or compiler dependent? If not, could it be documented, run once and then added to the repository per release?

This forces the build to be (at least) two-staged: first to generate an unoptimized product so that we can generate the training data, then again to build the product using that training data.

That shouldn't be a problem as long as both builds are source-build based with the additional data file as a new input for the second build.

dseefeld commented 6 years ago

Adding to General Prebuilt Removal. We can't have this binary data as a prebuilt, but need to determine if or how to deliver this with source-build.

dagood commented 4 years ago

Spotted this comment about open sourcing dotnet-pgo, envisioned as a replacement for ibcmerge and ibc data sometime post-5.0. https://github.com/dotnet/runtime/issues/34422#issuecomment-607507440. Text-based and/or reproducible training data seem like the next missing parts to get this into source-build.

MichaelSimons commented 2 weeks ago

FSharp has enabled PGO and would benefit from this as well - see https://github.com/dotnet/fsharp/pull/17513