AaronRobinsonMSFT / DNNE

Prototype native exports for a .NET Assembly.
MIT License
394 stars 41 forks source link

Assembly resolution differences #154

Closed mterwoord closed 1 year ago

mterwoord commented 1 year ago

I have legacy Delphi 6 application, which is extended by some .NET Framework 4.8 modules, by using the nuget package DllExport. That project basically works the same as this one.

I'm trying to move these .NET modules to .net 7, using DNNE. I'm really, really close. I run into behavioral differences in Assembly.Load. The .NET modules use devexpress using WPF for UI. At some point, WPF is trying to load a given fully qualified type. This fails on code running using DNNE, but succeeds when running in a standalone .net 7 project. I recompiled WPF, and putting a breakpoint on the spot where it fails, and faking the load using the debugger, everything works like a charm. This suggests to me that it's not a problem of missing dependencies.

Are there any known differences in the host this project makes and the default executable host?

I'm not sure if the above makes sense or not. Please let me know what I can do to get this issue diagnosed.

AaronRobinsonMSFT commented 1 year ago

I'm not sure if the above makes sense or not. Please let me know what I can do to get this issue diagnosed.

I get the fact that there are behavioral differences, but the precise issue being faced is difficult for me to fully understand. My first instinct would be to use fuslogvw.exe to help narrow down how they are being loaded for .NET Framework and then use dotnet-trace to understand what is going on in .NET 7.

Are there any known differences in the host this project makes and the default executable host?

I assume this i referring to a process where .NET 7 is hosted. Yes, DNNE loads the assembly into a new Assembly Load Context. This is the current design of the .NET hosting API, although a new option is being added in .NET 8.

mterwoord commented 1 year ago

OK, here's the code I can reproduce the issue with:

            var s = "System.Collections.Generic.Dictionary`2[[System.String, System.Runtime, Version=6.0.0.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a],[DevExpress.Xpf.Core.WpfSvgPalette, DevExpress.Xpf.Core.v22.2, Version=22.2.4.0, Culture=neutral, PublicKeyToken=b88d1754d700e49a]]";
            var a4 = Assembly.Load("System.Collections, Version=6.0.0.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a");
            var t4 = a4.GetType(s);

Yes, this refrences .net 6 stuff, but that's because the Devexpress assemblies are targetting .net 6.

The differences I mean is settings between a "normal" .net 7 wpf application, and the situation where the Delphi6 process loads .net 7 with DNNE.

AaronRobinsonMSFT commented 1 year ago

The differences I mean is settings between a "normal" .net 7 wpf application, and the situation where the Delphi6 process loads .net 7 with DNNE.

Ah. Yes, there are fundamental differences between the two. The first is that I don't think the WPF workload is going to be included in the TPA, so referencing any WPF assemblies is going to be tough without using a custom AssemblyLoadContext to help find the WPF assemblies. This will also need to help find Devexpress assemblies unless they are adjacent to the exporting managed assembly.

The dotnet-trace tool mentioned above will help sort out where assemblies are being looked for and what the custom AssemblyLoadContext will need to do in order to find the other assemblies.

mterwoord commented 1 year ago

I will do further digging with dotnet-trace, although it seems all dll's are loaded fine.

mterwoord commented 1 year ago

@AaronRobinsonMSFT I know this might very well be a needle/haystack problem, but can you give me any pointers where to start looking for host configuration differences?

I spent some time looking through the CoreCLR sources, but I cannot seem to find where the host is being bootstrapped.

Any suggestions are greatly appreciated.

AaronRobinsonMSFT commented 1 year ago

but can you give me any pointers where to start looking for host configuration differences?

@mterwoord I'm relatively sure the logic I mentioned above is the culprit - the lack of WPF assemblies in the TPA. The entry point for coreclr initialization is here, but starts in DNNE from here. For a normal WPF application many more properties are filled out by reading the .runtimeconfig.json file and passing it down.

Were you able to collect a trace? Stepping through the apphost start up for a WPF application is interesting from an academic exercise, but I'm not convinced that is worth the effort. Please let me know if you were having trouble collecting the traces. Recall that DNNE loads its assembly in a seperate ALC so handling ALCs for DNNE is likely to be a requirement regardless of how the runtime is initialized.

Perhaps @elinor-fung has another avenue of investigation or thoughts.

elinor-fung commented 1 year ago

the lack of WPF assemblies in the TPA

If your library references the .NET Desktop SDK, the generated .runtimeconfig.json should include the corresponding framework (Microsoft.WindowsDesktop.App) such that WPF assemblies will be in the TPA.

The trace would be helpful to see what is actually going on. Like @AaronRobinsonMSFT, I expect this is ALC-related - even though the DLLs are loaded fine, they may not be in the ALC your code is expecting.

Other thoughts:

mterwoord commented 1 year ago

Did some digging. The .runtimeconfig.json references the Desktop SDK. What I found out before, but it's being confirmed by a dotnet-trace run, is that my code (exported method by DNNE) starts off in an "IsolatedComponentLoadContext", and somehow, at some point, it's also utilizing the Default context. The situation where I get issues shows that somehow the code is running from the Default context instead of the IsolatedComponentLoadCOntext.

Is there any way to get stack traces of the loader events i'm looking for? Ie, find out where the transition is being made?

Other option would be to somehow get rid of the IsolatedComponentLoadContext and only have 1 ALC, which is also the default one.

mterwoord commented 1 year ago

I did some further digging. It seems that some DevExpress module initializer triggers code into the Default ALC. I now tricked my main entrypoint in giving control to the Default ALC, and then everytrhing works. I will close this ticket now. I'm happy to give more insights if needed/wanted by anyone.