[API Proposal]: Introduce isolation in AssemblyLoadContext

msedi commented 2 weeks ago

Background and motivation

Coming from these two discussions: #69899 and #102981.

Currently the AssemblyLoadContext (ALC) has no complete isolation from the default ALC. In detail, if you have static fields in your default ALC (e.g. logging), the user-created ALC "inherits" already existing static fields.

The get around this you have to load the assembly that contains the static field explicitly in the user-created ALC . The problem is that it is mostly unknown which assemblies have static fields and one has to find out until every assembly needs to be loaded until it works.

As far as I remember AppDomains provided a better isolation. So my idea would be to introduce an IsolcationLevel and the ALC does take care itself of isolating things depending on the level. A bool might be sufficient, but maybe there is a need for a more fine-grained control of the isolation

API Proposal

public enum IsolationLevel
{
  None,
  Full
}

class AssemblyLoadContext
{
  public AssemblyLoadContext(IsolationLevel isolationLevel)
  {
  }
}

API Usage

var alc = new AssembLoadContext (IsolationLevel.Full);

Alternative Designs

I currently don't know a better design.

Risks

I do not know about the side-effects.

dotnet-policy-service[bot] commented 2 weeks ago

Tagging subscribers to this area: @vitek-karas, @agocke, @vsadov See info in area-owners.md if you want to be subscribed.

jkotas commented 2 weeks ago

As far as I remember AppDomains provided a better isolation

Each AppDomain had a full copy of all statics (including CoreLib statics). Is that what you are asking for?

Once you create a full copy of all statics, exchanging types between the different domains becomes impossible. All calls between the different domains have to be marshalled. Once you have to marshal all calls between the different domains, it is much easier to just use processes as the isolation boundary.

msedi commented 2 weeks ago

@jkotas:

Once you create a full copy of all statics, exchanging types between the different domains becomes impossible

I think in the end this might be the consequence. It's not that I want AppDomains back ;-) I just mentioned that there was at least an isolation which I'm not able to get back without efforts using ALCs.

My problem is twofold (as we spoke a little in #102981).

It is hard to find out which static variables are there an which assemblies I have to load to get them isolated
Starting external processes (what we currently do) is a big problem since I have no good possibility to automatically attach a debugger to the secondary processes. As I said, we are doing this currently using COM interop with the Visual Studio SDK, but it turns out that it is slow and very buggy. The VS teams does not seem to have much interest in improving it since its maybe a niche.

Originally we had the secondary processes not as processes but only as normal task/jobs that were started. This was problem with logging, since we wanted to isolate processes regarding logging, but the logging framework had statics in it, which made it a problem to isolate the logging. Additionally, we wanted to isolate the tasks because of resource management. Each jobs has a total RAM consumption of around 50TB during its liefetime using a lot of unmanaged interop code and CUDA/GPU resources. Each job should start as clean as possible and to have no leftovers from the former job. So we came up using processes to really isolate the job which makes a lot of sense in terms of resource management. The host and the job are communicating via GRPC.

The problem now is that debugging is not as easy as before and it seems Visual Studio has capabilities to attach to another process, but thats too manual.

So in the end, we need some way to get back to the original state where debugging was easy, so we thought the ALC might be the best solution, but obviously is missing some AppDomain features we doidn't think about. In production we still use the external processes because we don't need to debug there.

jkotas commented 2 weeks ago

I just mentioned that there was at least an isolation which I'm not able to get back without efforts using ALCs.

Yes, that's expected. ALCs are cooperative unloading. If there are components that do not cooperate in the scheme, ALCs are not going to work. I do not see how this can be fixed without bringing full AppDomains back.

If the logging framework does not cooperate with ALCs, it needs to be fixed in the logging framework.

msedi commented 2 weeks ago

If the logging framework does not cooperate with ALCs, it needs to be fixed in the logging framework.

The logging framework was just an example. There are many libraries we use that have static things that are only evaluated once and then store in a static readonly field.

It is hard to discover which libraries use it and the only chance is to run the program and check for errors.

How do you deal with external processes that you need to debug? Maybe there's a better way I don't know of.

jkotas commented 2 weeks ago

Have you seen https://marketplace.visualstudio.com/items?itemName=vsdbgplat.MicrosoftChildProcessDebuggingPowerTool ?

teo-tsirpanis commented 2 weeks ago

There are many libraries we use that have static things that are only evaluated once and then store in a static readonly field.

You can load these libraries multiple times in separate ALCs.

msedi commented 2 weeks ago

You can load these libraries multiple times in separate ALCs.

Right, but it is hard to find out which libraries can cause problems and which I need to load. In the end, because I don't know, I need to load maybe the whole dpendency tree of the libraries to work around this.

msedi commented 2 weeks ago

Have you seen https://marketplace.visualstudio.com/items?itemName=vsdbgplat.MicrosoftChildProcessDebuggingPowerTool ?

Yes, I have tried it already, but its a bit hard to configure and to roll it out in an enterprise environment. I can check again. Currently people are able to start working directly after cloning without having to set up the environment.

Having this extension (which I cannot even force that people need to install), there will be a lot of tickets in our support team from people reporting issues with the extension ;-)

Addendum: I have tried again teh child process debugger and there now seems to be a possibility to store the config not in the suo file but in a separate setting. So I will try again.

teo-tsirpanis commented 2 weeks ago

What I have been doing is, hold the ALC in a weak reference and run GC a couple times after calling Unload(), and if the weak reference is still alive, emit a warning. You could take it further and capture a memory dump of the process to help you with diagnosing the unloadability problems.

In the end, because I don't know, I need to load maybe the whole dpendency tree of the libraries to work around this.

I believe sharing the minimum necessary set of assemblies is indeed the best way to go and that's what I am doing.

dotnet / runtime