dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
15.07k stars 4.69k forks source link

Do not lock assembly to run `STARTUP_HOOKS` #78153

Closed MarcoRossignoli closed 1 year ago

MarcoRossignoli commented 1 year ago

Current behavior

Today if we use the STARTUP_HOOKS extension point to run code before the main of a console application the assembly containing the Main is loaded and kept in memory preventing from rewriting the assembly. Here a simple repro: ClassLibrary1 project

using System;
internal class StartupHook
{
    public static void Initialize()
    {
        Console.WriteLine("I'm the StartupHook");
        Console.WriteLine("");
        foreach (var item in AppDomain.CurrentDomain.GetAssemblies())
        {
            Console.WriteLine(item.FullName);
        }
        Console.WriteLine("");
    }
}

Startuphooks console applicatication

using System;
public class Program
{
    public static void Main(string[] args)
    {
        Console.WriteLine("I'm the main");
    }
}

Startuphooks.csproj

<Project Sdk="Microsoft.NET.Sdk">

  <PropertyGroup>
    <OutputType>Exe</OutputType>
    <TargetFramework>net7.0</TargetFramework>
  </PropertyGroup>
  <ItemGroup>
    <ProjectReference Include="..\ClassLibrary1\ClassLibrary1.csproj" />
  </ItemGroup>
  <ItemGroup>
    <RuntimeHostConfigurationOption Include="STARTUP_HOOKS" Value="ClassLibrary1" />
  </ItemGroup>
</Project>

The output at runtime is:

I'm the StartupHook

System.Private.CoreLib, Version=7.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e
startuphooks, Version=1.0.0.0, Culture=neutral, PublicKeyToken=null
System.Runtime, Version=7.0.0.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a
Microsoft.Extensions.DotNetDeltaApplier, Version=17.0.0.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a
System.IO.Pipes, Version=7.0.0.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a
System.Linq, Version=7.0.0.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a
System.Collections, Version=7.0.0.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a
ClassLibrary1, Version=1.0.0.0, Culture=neutral, PublicKeyToken=null
netstandard, Version=2.1.0.0, Culture=neutral, PublicKeyToken=cc7b13ffcd2ddd51
System.Console, Version=7.0.0.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a
System.Collections.Concurrent, Version=7.0.0.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a
System.Threading, Version=7.0.0.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a
System.Text.Encoding.Extensions, Version=7.0.0.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a
System.Runtime.InteropServices, Version=7.0.0.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a

I'm the main

As we can notice the startuphooks.dll is loaded.

Expected behavior

We'd like to be able to rewrite the main assembly while we're inside the STARTUP_HOOKS code and we should be able to do that because assembly containing the STARTUP_HOOKS code is unrelated to the main assembly.

cc: @vitek-karas @elinor-fung @nohwnd @Evangelink @pavelhorak @jakubch1 @fhnaseer @cvpoienaru @drognanar @engyebrahim

dotnet-issue-labeler[bot] commented 1 year ago

I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label.

ghost commented 1 year ago

Tagging subscribers to this area: @vitek-karas, @agocke, @vsadov See info in area-owners.md if you want to be subscribed.

Issue Details
### Current behavior Today if we use the `STARTUP_HOOKS` extension point to run code before the main of a console application the assembly containing the `Main` is loaded and kept in memory preventing us to "rewrite" the assembly. Here a simple repro: ClassLibrary1 project ```cs using System; internal class StartupHook { public static void Initialize() { Console.WriteLine("I'm the StartupHook"); Console.WriteLine(""); foreach (var item in AppDomain.CurrentDomain.GetAssemblies()) { Console.WriteLine(item.FullName); } Console.WriteLine(""); } } ``` Startuphooks console applicatication ```cs using System; public class Program { public static void Main(string[] args) { Console.WriteLine("I'm the main"); } } ``` Startuphooks.csproj ```xml Exe net7.0 ``` The output at runtime is: ``` I'm the StartupHook System.Private.CoreLib, Version=7.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e startuphooks, Version=1.0.0.0, Culture=neutral, PublicKeyToken=null System.Runtime, Version=7.0.0.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a Microsoft.Extensions.DotNetDeltaApplier, Version=17.0.0.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a System.IO.Pipes, Version=7.0.0.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a System.Linq, Version=7.0.0.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a System.Collections, Version=7.0.0.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a ClassLibrary1, Version=1.0.0.0, Culture=neutral, PublicKeyToken=null netstandard, Version=2.1.0.0, Culture=neutral, PublicKeyToken=cc7b13ffcd2ddd51 System.Console, Version=7.0.0.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a System.Collections.Concurrent, Version=7.0.0.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a System.Threading, Version=7.0.0.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a System.Text.Encoding.Extensions, Version=7.0.0.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a System.Runtime.InteropServices, Version=7.0.0.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a I'm the main ``` As we can notice the startuphooks.dll is loaded. ### Expected behavior We'd like to be able to rewrite the main assembly while we're inside the `STARTUP_HOOKS` code and we should be able to do that because assembly containing the `STARTUP_HOOKS` code is unrelated to the main assembly. cc: @vitek-karas @elinor-fung @nohwnd @Evangelink @pavelhorak @jakubch1 @fhnaseer @cvpoienaru @drognanar @engyebrahim
Author: MarcoRossignoli
Assignees: -
Labels: `area-Host`, `untriaged`
Milestone: -
jkotas commented 1 year ago

The original startup hook design called for loading the main assembly only after startup hooks are executed. IIRC, this design detail was changed before the feature shipped because:

We'd like to be able to rewrite the main assembly while we're inside the STARTUP_HOOKS code

Why can't you do that during build? Self-modifying application during startup sounds like a recipe for more problems.

elinor-fung commented 1 year ago

@MarcoRossignoli can provide more/better context, but the desired scenario is to improve the test platform experience such that tests can be run like a console app. Part of the test platform is allowing extensions to rewrite the main assembly - for example, static instrumentation, code coverage. For a user to be able to run a test executable, let extensions modify the main assembly, and then actually run the tests in the main assembly, startup hooks seemed like a reasonable mechanism.

MarcoRossignoli commented 1 year ago

@elinor-fung we're discussing off-line with @jkotas on it, one possible concern is the assumption done by current code and maybe we could make it an opt-in mechanism through runtimeconfig.json/msbuild prop. (I'm preparing the document with the high level design for the "whole" platform that I'll share with you all so we can clarify better the reasons).

jkotas commented 1 year ago

Yes, we have been discussing offline. I am not convinced that self-modifying tests are a good design. For example, it means that re-running the tests is not guaranteed to run the same binary. I think there should be a clear separation between building the tests (that writes the test binaries to disk) and running the test (that does not modify the test binaries on disk).

MarcoRossignoli commented 1 year ago

For example, it means that re-running the tests is not guaranteed to run the same binary.

Today is "guarantee" by the tool that is doing the instrumentation. By design coverlet and our tool in case of static instrumentation is "restoring" the binaries at the end and a user can disable it and usually is the default choice of users because they want to discard the test build result to save test time.

alexrp commented 1 year ago

Is there a particular reason that the test platform couldn't just include a source generator that generates Main()?

MarcoRossignoli commented 1 year ago

@alexrp we're working in that direction. Anyway code generators are not enough, they're good only for C#, keeping out VB and F# but that's not a problem we're able to do that without the need of code generators. The Main() code to inject is very simple.

jkotas commented 1 year ago

By design coverlet and our tool in case of static instrumentation is "restoring" the binaries at the end and a user can disable it

Why is it that hard to run the static instrumentation before running the test? You need to set the environment variable to enable the startup hook. Can you just run the static instrumentation before running the test instead setting the environment variable before running the test?

elinor-fung commented 1 year ago

You need to set the environment variable to enable the startup hook

The test SDK could also put it in the runtime config via a .targets/.props.

jkotas commented 1 year ago

It means editing runtime config. Instead of editing runtime config, can SDK run the instrumentation process at that point?

vitek-karas commented 1 year ago

The runtime config can be "Edited" in MSbuild. Adding RuntimeHostConfigurationOption will put that property into runtime config. That's how we do all the other runtime properties like GC settings, feature switches and so on.

vitek-karas commented 1 year ago

The main idea behind this effort is to build tests as normal apps, so the result is a normal executable which you can run directly without any special environment. If executed like that, it runs the tests and prints out results. There are many advantages to this approach:

MarcoRossignoli commented 1 year ago

It means editing runtime config. Instead of editing runtime config, can SDK run the instrumentation process at that point?

As @vitek-karas described I want to put emphasis on the fact that thanks to the new architecture the SDK become not mandatory. We can dotnet build/publish copy/paste and run. So we can run also where the SDK/machine wide runtime is missing. For this reason we need an architecture that let us to run and load in-process and out-of-process extensions without "infra support", runner/host will be folded. Clearly we'll keep the dotnet test mechanics, but we'll also enable a lot of new scenario impossible to support today, beside other benefits(like to be faster in release new features and bug fix due to the fact that we don't need SDK/VS insertions).

am11 commented 1 year ago

Anyway code generators are not enough, they're good only for C#, keeping out VB and F#

The startup hooks also have limitations in a different dimension; today, only corehost with coreclr provide startup hooks support. That leaves mono (all form factors) and coreclr-nativeaot.

The runtime config can be "Edited" in MSbuild

So can <StartupObject>, <Exec> task to run custom program/command, custom msbuild task, source generators etc.

In theory, coreclr can set startup_hook assembly as 'root' under a (yet another) flag for this kind of use-cases. Currently it is setting assembly-with-Main-method as 'root', so in the startup hook assembly, APIs like GetExecutingAssembly() return main assembly's info, instead of startup hook assembly.

MarcoRossignoli commented 1 year ago

The startup hooks also have limitations in a different dimension; today, only corehost with coreclr provide startup hooks support. That leaves mono (all form factors) and coreclr-nativeaot.

Startup hooks are needed to start out-of-proc extension and have a chance to rewrite the main assembly to support the vast majority of the use case we have today that are running dotnet test with corehost/coreclr. Inside the main we'll check if out-of-proc extension are started and if not we'll start these having 99% parity with the runtimes are supporting the startup hooks. The only limitation in this case is that you'll fail to rewrite the main assembly so if that's needed the rewrite needs to happen "before" the start of the host(tools, build time). But having the needs of a tool/build time code is not easy and standard approach today and to have a smooth move to the new model we should try to adhere much as possible to the current architecture if possible, to make current extensions easy to port. I don't deny that a "tools" driven test world is something that could happen in future, but it's too early to understand it.

So can <StartupObject>, <Exec> task to run custom program/command, custom msbuild task, source generators etc.

My experience with is not super good if not for build where I know everything and that are not too extendible, I feel custom msbuild task better for integration like that, start new processes during the build sometimes ends in race condition issue and hard to investigate issue.

Can <StartupObject> point to a different assembly and not to the assembly build by current csproj?In that case could work and we'll end to load test sdk keeping free the user dll, but my feeling is that it should not be supported, but I never used that msbuild prop. And I think that also if supported won't be good for aot/trimmability because user dll won't be statically referenced.

jkotas commented 1 year ago

The main idea behind this effort is to build tests as normal apps

Yes, I do like and support this idea.

Startup hooks are not compatible with trimmed or aot compiled apps. Editing of the application binaries on disk is not supported by some app models (e.g. mobile apps).

Startup hooks are needed to start out-of-proc extension and have a chance to rewrite the main assembly to support the vast majority of the use case we have today that are running dotnet test with corehost/coreclr.

If these out-of-proc extensions are important to have, I would suggest you keep the existing flow for them. The tests that depend on these out-of-proc extensions that self-modify the binaries are not normal apps and never going to be.

MarcoRossignoli commented 1 year ago

The tests that depend on these out-of-proc extensions that self-modify the binaries are not normal apps and never going to be.

Code coverage is pretty common during testing(coverlet is extensively used for code coverage and it rewrites assemblies) and static managed one is lot more stable/easy to implement than profiler based one, profiler are not cooperative written in native code and hard to build for all OS flavors(today we rely on CLRIE https://github.com/microsoft/CLRInstrumentationEngine to make them work with other profilers and we have so some dependency and degree of indirection, it would be great have a completely new managed api to make IL modification on-the-fly). Static managed coverage is usually written in .NET(i.e. Cecil) and supported by design for all OS flavors(we find out that are also faster than profiler one in some scenario because we benefit of the by design jitter optimization without the need to handle for instance the inlining), cooperative by design(the rewrite is serialized between all extensions).

If these out-of-proc extensions are important to have, I would suggest you keep the existing flow for them

The main problem is that it means drop the idea of dotnet build/publish and copy/paste because as soon as the test module starts main is locked. It means change completely the architecture and ship our custom runner compiled for every platform, drop it in the bin, make sure we're called correctly to be able to rewrite before start the real "test console app". It means we're no more able to simple run with dotnet run or simply starting a MyTest.exe with full features support.

The only other clean and simple solution is to drop the support for "rewrite the main assembly from out-of-proc extension" and tools like static coverage needs to be rewritten to make the instrumentation at build time and change completely the architecture of current coverage tooling, with the minimal risk that users will publish in prod instrumented libraries if they've not different workflow for release, today the the workflow is usually "rewrite(); try {run();} finally {restore();}".

jkotas commented 1 year ago

The main problem is that it means drop the idea of dotnet build/publish and copy/paste because as soon as the test module starts main is locked.

I disagree. The build/publish pipeline is extensible. You just need to insert the binary instrumentation in the right spot into this pipeline.

MarcoRossignoli commented 1 year ago

The build/publish pipeline is extensible

Yep, as I've said above the other solution is to change how the static instrumentation is working compared to the current available tooling. It's a little bit different than today. The workflow would be: 1) dotnet publish\build it produces instrumented libs plus it needs to save the original libraries and some information somewhere inside the bin(folder, zipped file etc...). 2) move the bin to a different server and start the app and the out-of-process extension will communicated with the instrumented part(usually shared memory). Usually here the out-of-proc starts before the main module load and so can open the shared memory/pipes, now we cannot anymore and the shared memory/pipes needs to be handled in the host I mean we have to revert the client/server role. 3) Here there's the main difference...now the libs are instrumented...so you have to throw it away or have some other way to "restore" the original libs if you want to move to prod or use for something different than testing. And if you want to run with and without instrumentation you have to build 2 time.

cc: @jakubch1 @fhnaseer

Or do you have in mind a possible different workflow?

jkotas commented 1 year ago

Yes, it is the roughly workflow that I have in mind.

The part that I do not understand is why you are trying to hide the instrumentation by swapping the binaries back and forth.

I would treat instrumentation the same way as (for example) PublishTrimmed. If you run the build with PublishTrimmed=true, the output directory is going to have the trimmed binaries. Nothing tries to save the original binaries to make it easy to quickly switch back to what you would get with PublishTrimmed=false. You have to rebuild the app if you want to get non-trimmed flavor. If you would live to have both trimmed and non-trimmed published apps in the build output at the same time, the best way to do that is to configure a different directory publish directory for each one so that they do not collide.

MarcoRossignoli commented 1 year ago

The part that I do not understand is why you are trying to hide the instrumentation by swapping the binaries back and forth.

Because coverage is a bit "magic", I mean users expect to have the coverage at the end of the test, but they don't know the difference between static and dynamic and the fact that's transparent my/our concern is that if we don't restore the libs at the end like all tools are doing today there's a risk that users will ship in production the libs produced by test. Imagine a pipeline where user does dotnet publish -> run tests -> ship to prod using some script libraries tested because "that's what I tested and I trust that code", if we don't revert user will ship instrumented libraries without knowing that because anyway it won't fail at runtime(static coverage is done in such a way that an instrumented library won't fail if not under "coverage" but the code is "dirty").

Trimming is a bit different imo, it's what I want and expect to find inside the bin so I can ship that, I've trimmed on purpose, coverage is "completely transparent". At least till today static coverage tools works like that because we're not in control of the "deploy" workflow of users and we try to be transparent as possible.

Also in case of managed instrumentation done at build time means that we cannot "fail" the run if we find out that the module is instrumented(for instance in production) because is expected at test time and at some time someone will connect/collect information from it.

jkotas commented 1 year ago

that's transparent my/our concern is that if we don't restore the libs at the end like all tools are doing today there's a risk that users will ship in production the libs produced by test.

You can address this concern by dropping the instrumented copy of the tests into a different directory. You can even delete the directory with the instrumented bits at the end of the coverage run if you are worried about somebody using them accidentally.

I think that mutating build output is just a source of problems and confusion. For example, somebody may think "the build finished and it is just running tests now, so I can start uploading the bits to server", not realizing that the bits are mutated temporarily. Or somebody may think "the coverage crashed but I do not care about that, I can upload the bits to the server anyway", not realizing that the bits are mutated.

MarcoRossignoli commented 1 year ago

For example, somebody may think "the build finished and it is just running tests now, so I can start uploading the bits to server", not realizing that the bits are mutated temporarily. Or somebody may think "the coverage crashed but I do not care about that, I can upload the bits to the server anyway", not realizing that the bits are mutated.

I agree that can happen, FWIK till today what I've seen is that users wait the "successful test result" to "move on" the pipeline, but yep that can happen also today.

MarcoRossignoli commented 1 year ago

Discussed off-line, we will solve with a different strategy that doesn't need anymore this runtime update. Thanks to all for the interesting discussion.