Closed brthor closed 8 years ago
@rainersigwald @brthor @cdmihai I've been asked to help out with these MSBuild issues, and I've already picked up #658. This issue looks to be related (at least to the extent that it has to do with assembly loading).
Would it be possible for one of you to share the repro with me?
To repro:
feature/msbuild
branchbuild
from the root of the repo@eerhardt Thanks!
@eerhardt I'm not able to reproduce the issue using the provided steps. With or without the <Message>
element in build.proj, the build succeeds.
@tmeschter If the entire MSBuildWorkaroundTarget
and references are removed do you see the behavior?
Also comment out the Exec call on line 79:
https://github.com/dotnet/cli/blob/feature/msbuild/build.proj#L79
The issue seems to occur when the root project doesn't use any built-in tasks (like Message or Exec), and it imports a separate .targets file that uses a built-in task (like Message or Exec).
@brthor @eerhardt Still can't repro this, even after removing the MSBuildWorkaroundTarget
and all references to it, and removing the Exec
call on line 79. Of course, without the Exec
call the build fails with an unrelated problem...
Moving this to build/Microsoft.DotNet.Cli.Prepare.targets
instead of deleting it might help get the repro.
https://github.com/dotnet/cli/blob/feature/msbuild/build.proj#L75-L80
@brthor Can you please make the repro inducing edits in a separate branch and link it here? Would make it easier for multiple people to repro the issue by just checking out a branch and building.
@cdmihai @tmeschter
Can you please make the repro inducing edits in a separate branch and link it here? Would make it easier for multiple people to repro the issue by just checking out a branch and building.
That was the purpose of the offline shared repro. Is that repro not usable for debugging?
@brthor Unfortunately VS crashes when trying to load symbols for the copy of Microsoft.Build.dll used by that offline repro. And when I drop in locally-built MSBuild binaries I end up with assembly version mismatches. So at the moment I'm trying to construct a simpler repro with newer bits.
It sounded like @anurse had a repro of this bug as well. Maybe he has an separate branch that you can checkout.
I don't have a very convenient repro yet, it's tied into our build system which isn't a great standalone example yet.
@tmeschter I've pushed a branch exhibiting the repro.
git checkout brthor/repro750
build.cmd /t:BuildDotnetCliBuildFramework
build.cmd /t:InitializeCommonProps
I had some trouble getting a solid repro, a target in a file that only called built in tasks didn't seem to trigger it. A target calling a custom task followed by the built in task when the target isn't in the top level project did trigger it. I don't know if that info will be useful :smile:
@brthor Great; that does reproduce the issue.
The exception occurs while running this command:
E:\Projects\cli\.dotnet_stage0\x64\dotnet.exe E:\Projects\cli\.dotnet_stage0\x64\sdk\1.0.0-featmsbuild-003411\MSBuild.exe /p:Architecture=x64 /t:InitializeCommonProps build.proj
However, if I replace the Microsoft.Build*.dll binaries under 1.0.0-featmsbuild-003411 with locally-built bits and run that command again it works.
I notice that the original Microsoft.Build.* binaries under E:\Projects\cli.dotnet_stage0\x64\sdk\1.0.0-featmsbuild-003411 are about twice as large as my locally-built bits. Are these actually NGEN'd bits (despite the lack of a .ni.dll suffix)? If so, how were they produced?
Note that I know almost nothing about the dotnet CLI and its use, so I'll need a certain amount of hand-holding to understand how this is all supposed to work.
Indeed @tmeschter We do crossgen these bits.
They're produced here https://github.com/dotnet/cli/blob/feature/msbuild/build/Microsoft.DotNet.Cli.Compile.targets#L193
Feel free to ping me with any questions on this :smile:
@brthor Does the CrossGen task produce .pdbs for these bits? Actually, any suggestions you have on how to debug through these bits would be helpful.
/cc @eerhardt
@tmeschter In the past I haven't needed to do anything special for crossgen'd binaries besides pointing at the pdb in the VS debugger.
Do you have the Pdbs for the binary which is already crossgend? It comes from the msbuild nuget package, but I'm not sure where symbols are published.
/cc @gkhanna79 @johnchen0 Can you provide some guidance here?
The error we're attempting to debug is
Native image cannot be loaded multiple times
Can someone summarize the problem :) ?
@gkhanna79
The problem appears to be that ordering targets which invoke custom/builtin tasks in msbuild in a certain way produces an error that can only be reproduced when the MSBuild binaries are crossgend.
The error we're seeing is posted above but the interesting part is:
Native image cannot be loaded multiple times
@tmeschter Is working to debug the crossgend MSBuild bits and looking for any guidance on how to do the debugging there.
@brthor This issue doesn't repro if the MSBuild binaries are not crossgend--is that an option for you?
I did some debugging and found what happened: MSBuild first loaded Microsoft.Build.Utilities.Core.dll using AssemblyLoadContext.LoadFromAssemblyPath
. It then tried to load a type from Microsoft.Build.Tasks.Core.dll, which depends on Microsoft.Build.Utilities.Core.dll. While resolving this dependency, the runtime decides to reload Microsoft.Build.Utilities.Core.dll, probably because the previous load was in the wrong context. When Microsoft.Build.Utilities.Core.dll had been compiled by CrossGen, the second load was disallowed, since we don't allow multiple loads of a DLL containing native code, thus generating an error.
@gkhanna79: How is AssemblyLoadContext.LoadFromAssemblyPath
expected to interact with other types of assembly loads?
@tmeschter This would reduce the performance of dotnet build. Hopefully we could instead drive the bugfix in crossgen or msbuild (wherever it exists). Disabling it temporarily may be an option.
Thanks for the great info @JohnChen0
Based on @JohnChen0's analysis, this seems like an issue with CrossGen or the runtime. At least, it isn't clear to me what MSBuild could or should be doing differently
We will investigate what can be done from the runtime to prevent this issue, but this will take some time, and we don't yet know what are the possibilities. In the mean time, there are a couple of ways to unblock CLI build:
AssemblyLoadContext.Resolve
, so it returns null if the target assembly is on the TPA list, instead of calling AssemblyLoadContext.LoadFromAssemblyPath
. This should cause the runtime to load the assembly from TPA.@JohnChen0 What if MSBuild simply used the default AssemblyLoadContext
rather than creating a derived type? It seems like that would keep all of the loads in the same context, and effectively avoid the issue.
@AndyGerlicher @rainersigwald Do you know why we're creating our own AssemblyLoadContext
rather than simply using the default? We could handle loading dependencies by hooking the AssemblyLoadContext.Resolve
event.
The default AssemblyLoadContext
is an abstract class, and thus can't be used directly. If MSBuild only loads assemblies from TPA, then it can rely on runtime's own assembly probing mechanism, and doesn't need to use AssemblyLoadContext
. However, I suspect MSBuild needs to load assemblies from arbitrary paths specified by build projects, in which case it needs a derived AssemblyLoadContext
. The recommendation is for its implementation of AssemblyLoadContext
to distinguish between assemblies on the TPA and those not on the TPA, and only explicitly load assemblies not on the TPA.
@JohnChen0 I meant could/should we just use the object returned by AssemblyLoadContext.Default
.
And yes, MSBuild needs to load assemblies from arbitrary paths, as well as locate those assemblies' dependencies.
Also, could you define "TPA"?
Modify MSBuild's implementation of AssemblyLoadContext.Resolve
Not Resolve
event but Load
override of custom AssemblyLoadContext should return null to allow the TPA (DefaultContext) binder to perform the bind.
@tmeschter Think of TPA (Trusted Platform Assembly) list as the set of binaries that constitute the app (and thus, are referred to by their respective package entries in app.deps.json).
If all the assemblies that need to be loaded are known ahead of time, then ensure that they are part of the app and listed in app.deps.json. This will make them always be loaded from a single (Default) loadcontext.
If you do not know their path ahead of time BUT know that the ones that need to be loaded from a single context, then in your custom AssemblyLoadContext's Load override, check for the assembly and load it in the default Context using AssemblyLoadContext.Default.LoadFromAssemblyPath.
@gkhanna79 Thanks; that is good to know.
Let me ask a more general question: under what circumstances would it make sense to derive from AssemblyLoadContext
rather than simply use AssemblyLoadContext.Default
and hook the Resolve
event to help find assemblies?
AssemblyLoadContext
is about providing isolation to load same or different versions of a given assembly at runtime. Typically, developers do not have to worry about this. However, when dealing with scenarios like Plugin infrasturcture, the plugin infra needs to think about such issues and use AssemblyLoadContext to provide isolation for such loads.
Resolving event is a mechanism to enable a given load context have a fallback mechanism to point to assembly if one is not already loaded within the load context.
The specific issue you are running into is about attempting to load a given SharedFX assembly twice - this should not be required to be done since SharedFX is already part of the DefaultContext. Thus, you should consider returning null from the Load override.
@gkhanna79 Can you define "SharedFX"? Is it the same as "TPA"?
How do I know if a given assembly is part of the SharedFX and/or TPA?
Yes, it is the same as TPA.
You can determine the path to the SharedFX and check if the assembly exists there or not. If not, then this is an app specific assembly and the Load override should know how to load it.
Is there an API to get the path to the SharedFX?
Is there an API to get the path to the SharedFX?
The way we get it in the CLI is using this code:
https://github.com/dotnet/cli/blob/feature/msbuild/src/Microsoft.DotNet.Cli.Utils/Muxer.cs#L45
public static string GetDataFromAppDomain(string propertyName)
{
var appDomainType = typeof(object).GetTypeInfo().Assembly?.GetType("System.AppDomain");
var currentDomain = appDomainType?.GetProperty("CurrentDomain")?.GetValue(null);
var deps = appDomainType?.GetMethod("GetData")?.Invoke(currentDomain, new[] { propertyName });
return deps as string;
}
private bool TryResolveMuxerFromParentDirectories()
{
var fxDepsFile = GetDataFromAppDomain("FX_DEPS_FILE");
if (string.IsNullOrEmpty(fxDepsFile))
{
return false;
}
var fxDir = new FileInfo(fxDepsFile).Directory
Worth noting that @eerhardt 's posted solution requires you to be using the dotnet.exe host, and not corerun.
and not corerun.
I didn't think corerun supports the Shared FX at all. So yes, it can't can't be used with the code above.
My take away from all this is that there's basically no way to correctly implement an AssemblyLoadContext
without some rather deep knowledge of the runtime environment.
If an AssemblyLoadContext
should always/generally avoid loading an assembly that is part of the "SharedFX" and instead let the default context handle it, then that should be the default behavior. My AssemblyLoadContext
shouldn't even be asked about it unless it opts out of that default.
Further, since the application gets to decide whether or not to crossgen assemblies, my library shouldn't have to think about them at all when loading an assembly.
At this point I think I'm probably going to remove MSBuild's AssemblyLoadContext
in favor of using whatever is returned by AssemblyLoadContext.Default
and hooking its Resolve
method.
Is there any way general way to probe the TPA @gkhanna79 @JohnChen0 ?
Only looking in the sharedfx directory seems like it could be fragile in the long term, also doesn't the host also place the assemblies in the app's deps.json on TPA?
Do you want to chat before you make the change @tmeschter? It will be good to have context on why MSbuild had one in the first place and figure out the right next steps.
Also, you may run into https://github.com/dotnet/coreclr/issues/5837 with Resolving event, so lets chat on how you can achieve your intended goal.
Is there any way general way to probe the TPA @gkhanna79 @JohnChen0 ?
Unless TRUSTED_PLATFORM_ASSEMBLIES property is published (like FX_DEPS_FILE), then no.
@gkhanna79 MSBuild more or less copied the one from Roslyn: https://github.com/dotnet/roslyn/blob/d6f8dc441a1a3b749b5082579f582892236b1584/src/Compilers/Helpers/CoreClrAnalyzerAssemblyLoader.cs
However, Roslyn no longer implements an AssemblyLoadContext
either. It now simply grabs the existing one and hooks the Resolving
event.
The MSBuild requirements are pretty simply. We need to be able to load assemblies from arbitrary file paths. We also need to be able to load their dependencies, which we assume are located immediately next to them (or are supplied by the app/system, in which case we shouldn't have to think about them at all). We don't care about isolating one "plug-in" from another.
@tmeschter Roslyn is actually creating its own LoadContext (see https://github.com/dotnet/roslyn/blob/56f605c41915317ccdb925f66974ee52282609e7/src/Scripting/Core/Hosting/AssemblyLoader/CoreAssemblyLoaderImpl.cs#L37) and using its Load override to override what the DefaultContext may return.
We need to be able to load assemblies from arbitrary file paths
Do you know these paths ahead of time? Also, are you going to run against .NET Core RTM or using the CoreCLR that comes from CoreCLR master (since that will determine whether you need to workaround the Resolving bug or not)?
@gkhanna79 That implementation of AssemblyLoadContext
is solely for scripting scenarios (e.g. csi.exe/C# Interactive Window); it is not used in the C# or VB compilers themselves.
The paths are not known ahead of time.
I don't know if we're going to run against .NET Core RTM or the CoreCLR from CoreCLR master.
Would this be another option: have our AssemblyLoadContext
forward all load requests the default context, and only try to resolve an assembly if the default context can't? I'm not sure how that would differ from simply hooking the Resolving
event on the default context, but maybe there's something I'm missing.
have our AssemblyLoadContext forward all load requests the default context, and only try to resolve an assembly if the default context can't?
This sounds reasonable approach, by doing something like AssemblyLoadContext.Default.LoadFrom* in the Load override of your LoadContext implementation.
@gkhanna79 One issue with that approach is that I need to catch all exceptions that arise from AssemblyLoadContext.Default.LoadFrom* and ignore them, and then move on to checking the paths specific to my loader. Doable, but a bit ugly.
I also tried simply hooking the Resolving
event of the AssemblyLoadContext.Default
instance, with the handler locating the assembly and then calling AssemblyLoadContext.Default.LoadFromAssemblyPath
. This works to the extent that I can find and load the assembly, but then immediately dies with an ExecutionEngineException
as soon as we try to execute any code in the new assembly. Any idea about that one?
@brthor Assuming I can produce MSBuild bits with a fix, how should I go about updating the repro to test those new bits?
Patching the msbuild binaries under stage0 that gets downloaded as a part of the build under .dotnet_stage0/sdk/...
should do it.
They'll also need to be crossgen'd which means you'll need to run that tool. Alternatively we can wait for cli to pick up the new version and the stage0 will get the latest bits.
This error based on offline discussions:
Creating a standalone repro of this is proving challenging but I have shared a repro offline with @rainersigwald and @cdmihai
Using this issue for tracking workarounds we are making for this issue.