dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
15.16k stars 4.72k forks source link

Let roslyn generate reference assemblies from source projects and remove reference projects #58163

Open ViktorHofer opened 3 years ago

ViktorHofer commented 3 years ago

Packages produced by dotnet/runtime targeting NET6 and upwards don't contain reference assemblies anymore. The existing targeting packs Microsoft.NETCore.App.Ref, Microsoft.AspNetCore.App.Ref and Microsoft.WindowsDesktop.App.Ref are the only shipping assets which still contain reference assemblies (by design). That said, reference assemblies are general goodness as leveraging them improves incremental build performance which is why they are emitted by the SDK by default since .NET 5.

Problem statement

Reference assemblies in dotnet/runtime are currently generated via dedicated reference projects which expose various downsides:

  1. Having dedicated reference projects is confusing first time contributors as such a concept rarely exists in any other .NET solution or repository.
  2. The reference and source project files need to be kept in sync, i.e. TargetFrameworks, references, public APIs and any other input that impacts the project's output. This makes versioning and reving to a new major version harder.
  3. Since .NET 6, out-of-band reference assemblies like Microsoft.Extensions.* don't ship to customers anymore as they aren't part of the generated packages. Their sole purpose today is infrastructure only needs (ApiCompat and GenFacades).
  4. As reference projects need to be restored (as any other Microsoft.NET.Sdk project), they contribute to the overall restore performance. NuGet creates intermediate files per project and tfm: i.e. project.assets.json, $(MSBuildProjectName).nuget.g.props, $(MSBuildProjectName).nuget.g.targets. Removing reference projects will result in a faster repo per project restore, as less computation (traversing the graph) and less IO operations are necessary.
  5. Reference projects enlarge the dependency graph that tools like msbuild or Visual Studio need to care about. Every msbuild project needs to be parsed and evaluated which adds onto the overall build and tool performance. They also bloat solution files which are checked into the repository.
  6. Reference assemblies are an implementation detail and aren't expected to be surfaced as a primary output. With dedicated projects to generate reference assemblies, a contract needs to be established between the source and the reference project to make other projects that reference the source project use the reference assembly instead of the implementation assembly (for build perf reasons as explained above). This is very customized infrastructure in dotnet/runtime and required sequencing into a specific point in the build to make this happen. It's not guaranteed that this will continue to work as it's far from the the common path.

Solution

As of time of writing, there are 227 libraries under src/libraries and 63 of those are partial or full façade assemblies. Presumably, the 164 libraries which don't use GenFacade's <IsPartialFacadeAssembly /> feature could leverage Roslyn's RefOut feature to generate reference assemblies as a secondary build output of the source project.

To make this work, the following work needs to be done (in no particular order):

Partial façade assemblies

For the remaining 63 assemblies, GenFacades is used to add type forward attributes to the project's compilation input (via a generated Compile item) for public API that is missing from the compilation sources, but which is available in any of the referenced assemblies (ReferencePath). Missing is defined as an API that is available in the reference assembly but not in the implementation compilation sources. As an example, the System.AppContext type is exposed in the System.Runtime.dll reference assembly but its implementation is found in System.Private.CoreLib.dll and not in the System.Runtime.dll implementation assembly. The implementation assembly instead contains a type forward attribute that indicates that the type is exposed in another assembly.

It's yet unclear how to generate these type forwards without having a reference assembly available that represents the public API surface area. A possible solution to that could be to check in the type forwards, adding them to the compilation input and making sure that they are kept up to date. The problem with that is that Roslyn keeps passed-in type forward attributes in reference assemblies instead of following the type and replacing the attribute with the actual type. We should sync up with the Roslyn team to see if supporting this scenario is feasible.

PlatformNotSupportedException assemblies (PNSEs)

Some libraries generate PlatformNotSupportedException assemblies (via one of these two switches). The PNSE logic takes the reference project's source files as an input and transforms them to throw PNSEs instead of returning null.

It's yet unclear which input would be used without a reference project being existent. One option would be to feed in the generated reference source (which only acts as an output) as an input to keep the existing logic as-is. The obvious downside of that is that the PNSE logic would depend on the reference source file being up-to-date, which only other inner builds (which run in parallel) make sure of.

Alternatively the TFM that is responsible for generating the PNSE could leverage the RefOnly Roslyn switch to generate the reference assembly only, generate the reference source file from it and then invoke the existing PNSE logic an feed in that just generated reference source file. Ideally we would not depend on the reference source file which is considered as an output, at all and explore if it's possible to use the linker to transform the reference assembly to a PNSE one.

Public API differences

There is a list of public APIs that is intentionally different between the reference and the runtime assemblies. The major difference for all libraries is that the following set of attributes is not exposed in the reference assemblies: https://github.com/dotnet/runtime/blob/main/eng/DefaultGenApiDocIds.txt. In addition to that, libraries exist which differ in other public APIs as well, i.e. for System.Runtime: https://github.com/dotnet/runtime/blob/main/src/libraries/System.Runtime/src/MatchingRefApiCompatBaseline.txt.

To being able to leverage Roslyn to generate the reference assemblies, we would need to make it possible to exclude these APIs from the reference assembly.

cc @danmoseley @jaredpar @ericstj @stephentoub @eerhardt

ghost commented 3 years ago

Tagging subscribers to this area: @Anipik, @safern, @ViktorHofer See info in area-owners.md if you want to be subscribed.

Issue Details
Packages produce in NET6 and onwards don't contain reference assemblies anymore. The existing targeting packs `Microsoft.NETCore.App.Ref`, `Microsoft.AspNetCore.App.Ref` and `Microsoft.WindowsDesktop.App.Ref` are the only shipping assets which still contain reference assemblies (by design). That said, [reference assemblies are general goodness as leveraging them improves incremental build performance](https://github.com/dotnet/sdk/issues/2521) which is why they are emitted by the SDK by default since .NET 5. ## Problem statement Reference assemblies in dotnet/runtime are currently generated via dedicated reference projects which expose various downsides: 1. Having dedicated reference projects is confusing first time contributors as such a concept rarely exists in any other .NET solution or repository. 2. The reference and source project files need to be kept in sync, i.e. TargetFrameworks, references, **public APIs** any other input that impacts the project's output. This makes versioning and reving to a new major version harder. 3. As reference projects need to be restored (as any other `Microsoft.NET.Sdk` project), they contribute to the overall restore performance. NuGet creates intermediate files per project and tfm: i.e. `project.assets.json`, `$(MSBuildProjectName).nuget.g.props`, `$(MSBuildProjectName).nuget.g.targets`. Removing reference projects will result in a faster repo per project restore, as less computation (traversing the graph) and less IO operations are necessary. 4. Reference projects enlarge the dependency graph that tools like msbuild or Visual Studio need to care about. Every msbuild project needs to be parsed and evaluated which adds onto the overall build and tool performance. They also bloat solution files which are checked into the repository. 5. Reference assemblies are [_an implementation detail_](https://github.com/dotnet/msbuild/issues/6543#issuecomment-861612099) and aren't expected to be surfaced as a primary output. With dedicated projects to generate reference assemblies, a contract needs to be established between the source and the reference project to make other projects that reference the source project use the reference assembly instead of the implementation assembly (for build perf reasons as explained above). This is very customized infrastructure in dotnet/runtime and required sequencing into a specific point in the build to make this happen. It's not guaranteed that this will continue to work as it's far from the the common path. ## Solution As of time of writing, there are 227 libraries under `src/libraries` and 63 of those are partial or full façade assemblies. Presuambly, the 164 libraries which don't use [GenFacade's `` feature](https://github.com/dotnet/arcade/tree/main/src/Microsoft.DotNet.GenFacades) could leverage [Roslyn's RefOut feature](https://github.com/dotnet/roslyn/blob/main/docs/features/refout.md) to generate reference assemblies as a secondary build output of the source project. To make this work, the following work needs to be done (in no particular order): - [ ] ApiCompat needs to compare the emitted reference assembly against the implementation assemblies to make sure that the public api surface matches between rids of the same base TFM. This is already handled by package validation but we want this coverage for libraries which aren't packable (i.e. most of what's inside the shared framework) and we want this validation to fail the build for inner loop development. - [x] Use PackageValidation to make sure the api surface matches between compatible TFMs (ie netstandard2.0 and net6.0). - [ ] Leverage a tool like GenAPI to emit the public api surface area of all TFMs combined in one C# file. Put that file into `src\$(MSBuildProjectName).ref.cs` and exclude that file from being imported when the `` or `` properties are set to true. Update the to that warns about public API changes to also watch for changes to that file. - [ ] Diff the produced reference assemblies (roslyn vs runtime) and teach Roslyn to emit as beautiful reference assemblies as the ones runtime produces via dedicated reference projects. ### Partial façade assemblies For the remaining 63 assemblies, `GenFacades` is used to add type forward attributes to the project's compilation input (via a generated `Compile` item) for public API that is missing from the compilation sources but which is available in any of the referenced assemblies (`ReferencePath`). Missing is defined as an API that is available in the reference assembly but not in the implementation compilation sources. As an example, the `System.AppContext` type is exposed in the System.Runtime.dll reference assembly but its implementation is found in System.Private.CoreLib.dll and not in the System.Runtime.dll implementation assembly. The implementation assembly instead contains a type forward attribute that indicates that the type is exposed in another assembly. It's yet unclear how to generate these type forwards without having a reference assembly available that represents the public API surface area. A possible solution to that could be to check in the type forwards, adding them to the compilation input and making sure that they are kept up-to-date. The problem with that is that Roslyn keeps passed-in type forward attributes in reference assemblies instead of following the type and replacing the attribute with the actual type. We should sync up with the Roslyn team to see if supporting this scenario is feasible. cc @danmoseley @jaredpar @ericstj @stephentoub @eerhardt
Author: ViktorHofer
Assignees: -
Labels: `area-Infrastructure-libraries`
Milestone: 7.0.0