dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
15.46k stars 4.76k forks source link

Do not use coreclr.{dll/so/dylib} name for Mono runtime #34202

Closed jkotas closed 4 months ago

jkotas commented 4 years ago

cc @leculver @dotnet/dotnet-diag

jkoritzinsky commented 4 years ago

I think this also affects the hosting layer. For hostpolicy to be able to load Mono when it's not named coreclr.dll/so/dylib, it will have to know how to identify the mono runtime and load it (preferably using the same APIs as CoreCLR provides).

leculver commented 4 years ago

We need to avoid naming Mono coreclr.dll at all costs.

Not only does it confuse a lot of diagnostic tools we've built, it will also imply work to go "fix" the same problems in partner teams. This would include places like WinDbg, Watson, and !analyze. We've already burned a lot of our goodwill recently with some of the features we've built and not informed partner teams, creating unexpected work for them long after they had a chance to give input into the process. (For example not building consensus around Portable PDBs with partner teams before it was built, among other examples. We shouldn't repeat that mistake here.)

Aside from diagnostic tools and partner teams, this also could potentially affect our ability to properly diagnose Watson failures. Watson uses a module's filename and version alone to bucket failures so we'd potentially collide with other products' coreclr.dll. Note this problem is why we went out of our way to re-version coreclr.dll to a seemingly nonsensical values (e.g. 4.700.20.6603) to avoid collisions with the previous Silverlight versions, we are going to cause even more problems if the file version of a Mono coreclr.dll happens to be even near any of the other versions of coreclr.dll we've shipped. It will effectively mix all failures together making it very, very difficult to tell what our actual failures are in the products where these version/filenames collide.

We've been down this road before and we know it's a bad idea that will cause a lot of problems for us and our partner teams. Let's just not do this and not have the headache (or revert it now if it's already been done).

jkoritzinsky commented 4 years ago

cc: @jeffschwMSFT @vitek-karas @swaroop-sridhar for possibly required hosting changes to enable discovering a different name for the mono runtime when running on desktop.

steveisok commented 4 years ago

I actually messed around trying to learn the host and got it to load libMono.

I may have some local changes that aren't pushed, but this is a (hacked) start... maybe :-).

https://github.com/steveisok/runtime/tree/dotnet-vm-switch

vitek-karas commented 4 years ago

@steveisok Couple of questions about the intent of the changes:

If we plan to support FDD with mono VM, then that's a whole other problem with probably large number of hosting changes required (depending on how the Mono VM's NETCore.App framework would be named, versioned and how it would be installed).

If we plan to support only SCD with mono VM, then I think we should take one of the two approaches:

I personally would prefer the first option - we share the hostpolicy - we only teach it to recognize both runtime names. If we have the ABIs exactly the same, that's the only change necessary. We could diverge the ABIs slightly if necessary (but not too much).

Note that in either case we need to modify hostfxr to recognize both runtime names and treat them as identical. This is unfortunate legacy behavior where hostfxr uses the presence of coreclr.dll in the app as one of the clues to detect self-contained apps (while .runtimconfig.json should be the only source of truth really). You already made this change in the above branch.

steveisok commented 4 years ago

I think the first option makes sense for our use case. I don’t see us needing to diverge at this point at all. Also, the reason why I had a separate folder was primarily for mono’s System.Private.Corlib.

This branch was just for my learning, so if it helps at all, great. Thankfully, we are deploying mono as stand alone only because I agree there would be challenges with alternating runtimes in FDD.

lambdageek commented 4 years ago

Why does the name of the DLL matter? Mono implements the same hosting API as libcoreclr.so. If other tools are looking for other undocumented APIs, those APIs should be documented and Mono should implement them.

The two runtimes should be drop in replacements for each other. Differentiating runtime capabilities based on the name of the DLL seems to move in the opposite direction.

vitek-karas commented 4 years ago

The host needs to be able to find and load the runtime dll. See this doc which currently explicitly says that the final step is to find coreclr.dll and load it. As mentioned above, it's not a good idea to call mono's runtime coreclr.dll, so we need to change the host to be able to find and load mono instead.

Note that none of the host configuration files (.runtimeconfig.json and .deps.json) doesn't exactly specify which dll is the runtime - it does list the runtime dll as one of the native dependencies, but it doesn't say which out of the many native dependencies is the runtime dll (the entrypoint).

lambdageek commented 4 years ago

My concern is that if we build code that makes decisions based on the name of the runtime, rather than its capabilities, we're adding future technical debt. (What happens when Mono converges closer to CoreCLR?) I appreciate that the host doesn't have this limitation.

Watson's bucketing based on the DLL name is unfortunate.

vitek-karas commented 4 years ago

I do agree that ideally the only "difference" would be the name of the dll.

leculver commented 4 years ago

Watson's bucketing based on the DLL name is unfortunate.

That's how all Watson bucketing works and has worked for decades. It would not be a small project to try to change this just for coreclr.dll. We require all products to "play nice" with Watson so that we can find bugs (especially security related) in products which build on top of us.

By making it harder to distinguish between coreclr and mono, we are not just making our lives harder but also those of our partner teams at Microsoft who use Watson.

My concern is that if we build code that makes decisions based on the name of the runtime, rather than its capabilities, we're adding future technical debt. The two runtimes should be drop in replacements for each other. Differentiating runtime capabilities based on the name of the DLL seems to move in the opposite direction.

Please note that while the runtimes are converging in terms of runtime apis and behaviors, there's no plan to unify the diagnostics story between the two. At least there's no plan as far as I've heard. This means:

  1. By naming them the same thing we are adding technical debt today, and it's debt that has to be mostly paid by our partner teams and external customers who use and build diagnostic tools.
  2. The diagnostics capabilities, diagnostic apis, and diagnostic implementation of the two runtimes are different. From a diagnostics standpoint the two runtimes aren't drop in replacements for each other even if they are at runtime when executing user code.
  3. We will always have to distinguish the two runtimes from each other in some way in order for debuggers to properly handle crashdumps/coredumps/live debugging. Today we distinguish them by loaded module name. We could do something different, but debuggers and diagnostics tools will have a separate codepath for each runtime for the forseeable future. Making this change forces every debugger/tool currently enlightened to coreclr.dll to make changes for no benefit.

@lambdageek Architecturally I understand what you are pushing for, but the reality of our diagnostics story is complicated. The work we would need to do to accommodate naming both runtimes coreclr.dll the same is a very high just to make everything work exactly the same as it does today, and it's not scoped to our teams. It's changing the ecosystem in a way that everyone has to react to just to stay in the same place.

leculver commented 4 years ago

One last thought I wanted to capture here. Even ignoring other scenarios, when naming these two libraries the same name it would be very difficult to disambiguate between the two of them in some crashdump scenarios. Minidumps might not contain any parts of the module itself, not even the first few bytes of the PE header. Instead, we sometimes (but rarely) have to use only the MINIDUMP_MODULE structure to know about an image.

In cases where we have a minidump without module data, and where we cannot obtain the original PE image (such as no connection to the symbol server or CLR forgot to index a binary), then a debugger developer is in a real bind on what to do in that case. It would have to have two different debugging engines for mono vs coreclr, but it would have to guess as to which one to use because there's not enough info to know whether mono is loaded or coreclr is loaded...and debugging those two things aren't compatible.

This scenario should be rare, and it's not something a debugger couldn't overcome, but it's another example illustrating the messy situations that arise from naming these two DLLs the same name.

noahfalk commented 4 years ago

Watson's bucketing based on the DLL name is unfortunate.

I'm not sure why this is viewed as a negative and I may not understand some background issue? I agree that some places where code is currently using the module name to discriminate should not be doing so, but IMO Watson isn't one of them. In the case of Watson an abstraction based on capabilities appears undesirable, the implementation is the thing I want to correlate issues with.

leculver commented 4 years ago

@noahfalk:

Watson's bucketing based on the DLL name is unfortunate.

I'm not sure why this is viewed as a negative and I may not understand some background issue?

I'm not sure about Aleksey's position, but keying on the module name alone has the unfortunate side effect of grouping like-named modules into the same bucket, for example, everything I mentioned in my topmost reply. In a perfect world Watson would understand that coreclr.dll can be separate products such as Silverlight, .Net Core, and Mono (though the last case is what we are debating). That would let us bucket the products' failures separately instead of getting one giant bucket mixed together.

I say in an ideal world because I don't know how we would accomplish that given the technical challenges of Watson dumps. I agree it's unfortunate, but I don't know of any better way to do it.

steveisok commented 4 years ago

@jkotas Since a lot of our scope was readjusted for net6, I think we should move this issue to the 6.0 milestone.

am11 commented 4 years ago

The two runtimes should be drop in replacements for each other.

@lambdageek, sorry for an off-topic question (looking for a quick pointer / pro tip 🙂): currently, when we build all components of dotnet/runtime repo, i.e. ./build.sh -c Release, we get tarballs produced in artifacts/packages/Release/Shipping directory. Extracting the artifacts/packages/Release/Shipping/dotnet-runtime-5.0.0-dev-illumos-x64.tar.gz does not gives us mono. I manually overwrote mono's libcoreclr.so and corelib (built from the same git rev) on openindiana to the extracted location, and dotnet(1) was able to execute program using mono runtime.

What is the official packaging plan for 5.0; will we have a separate dotnet installer for mono for itanium or would it be in the same installer as coreclr? (it would be nice to have an environment variable to select the default runtime on dev box, e.g. DOTNET_RUNTIME_FLAVOR=mono)

CoffeeFlux commented 4 years ago

@am11 Aleksey is currently on paternity leave. Maybe @steveisok or @akoeplinger has an idea?

steveisok commented 4 years ago

@am11 The only mono runtime packs we are generating at this time are for osx, linux x64, and linux arm64 (technically wasm too). As I understand it for net5, there is no plan beyond what we're already generating.

am11 commented 4 years ago

@steveisok, are those mono dedicated packs? and does ./build.sh -c Release give us those packs in artifacts/packages/Release/Shipping or is there a different switch used to produce them?

steveisok commented 4 years ago

If you're, for example, on osx, you'll get a mono specific microsoft.netcore.app runtime pack when you build like that. And if mono is all you're after, then you can skip building coreclr w/ ./build.sh -Subset mono+libs+installer -c Release. It will speed up your build considerably.

am11 commented 4 years ago

Thank you. I have found artifacts/bin/ref/microsoft.netcore.app/Release/ on macOS and linux. However, that is missing bunch of things to execute an app (.dll), such as Microsoft.NETCore.App.deps.json and libhostpolicy.so (which we can fix by hand, sure). imo:

akoeplinger commented 4 years ago

We don't plan to do this for 5.0 as the main usage of mono in 5.0 is WebAssembly, but we'll flush out the "mono on desktop" details for a later release.

CoffeeFlux commented 4 years ago

Removing blocking-release a this was moved to 6.0 and that label is being used for 5.0 tracking.

SamMonoRT commented 3 years ago

Moving to 7.0.0

SamMonoRT commented 3 years ago

@akoeplinger - are you actively working on this ? Please adjust the milestone accordingly if so.

steveisok commented 4 months ago

Since the mono desktop packages are no longer going to be public, I don't think it makes sense to do the work.

See https://github.com/dotnet/docs/issues/41366