aspnet / JitBench

Repo for testing JIT stuff
Other
15 stars 24 forks source link

Pin .Net version and fix misc issues #77

Closed noahfalk closed 6 years ago

noahfalk commented 6 years ago

I've been encountering a number of usability and reliability issues that I wanted to fix:

1) Pinning the .Net runtime and .Net SDK to netcoreapp2.0 / version 2.0.0 by default. A previous change rolled forward all the .Net references to 2.1 prerelease builds but given this is the rel/2.0.0 branch that didn't seem like the right thing to do. Certainly future .Net release builds should promise back-compat with ASP.Net 2.0 but interim development builds may have back-compatibility bugs. I doubt any of the folks doing perf work in this repo want to take on the responsibility of tracking such bugs down. Allowing floating .Net versions also doesn't provide as stable a baseline for performance comparisons vs. updating individual binaries.

2) There is a major perf issue in the released version of the dotnet store, filed as https://github.com/dotnet/sdk/issues/1682. I have included a workaround because even after it is fixed the branch is still pinned to the older 2.0.0 SDK that has the bug.

3) There is a secondary less serious perf issue with dotnet store that causes msbuild to recursively enumerate every file in the directory containing CreateStore.proj every time it invokes crossgen.exe. Effectively this meant every file in the JitBench repo + every file in the new .dotnet and .store directories was enumerated dozens of times. I mitigated the issue by putting the project in a sub-directory. Together this fix and the previous one cut dotnet store time down to about 45 seconds on my machine.

4) Due to an (unforetunate) design choice, the dotnet.exe in the newly installed .dotnet directory wasn't actually isolated. It would scan and load files from the global dotnet directory if the global directory had more recent versions. This behavior could easily contribute to non-deterministic issues when running the scenario on different machines that have different content installed in the global location. Thankfully there is an environment variable DOTNET_MULTILEVEL_LOOKUP that can be used to opt-out of that behavior.

5) For reasons I don't understand, the previous logging configuration in Program.cs of MusicStore did not actually suppress logs below the warning level. This meant the key perf numbers had a few console screens of unneeded spew interspersed. The alternate overload of the logging filter works as you would expect it to.

noahfalk commented 6 years ago

@rynowak @davmason @AndyAyersMS @jorive - PTAL

FYI: I've seen there is at least one more issue beyond this set where dotnet store is pulling clrjit.dll from the NuGet cache instead of the newly installed dotnet directory. I wanted to start getting some of these fixes in rather than building up too much of a backlog though.

noahfalk commented 6 years ago

@rynowak - Not sure whats up with the CI result at this point. I don't have a mac handy to try reproing it and its not clear how the failure relates to my changes other than I switched everything to 2.0 release builds. Were there known bugs in the RTM bits?

rynowak commented 6 years ago

A previous change rolled forward all the .Net references to 2.1 prerelease builds but given this is the rel/2.0.0 branch that didn't seem like the right thing to do. Certainly future .Net release builds should promise back-compat with ASP.Net 2.0 but interim development builds may have back-compatibility bugs. I doubt any of the folks doing perf work in this repo want to take on the responsibility of tracking such bugs dow

This was an intentional choice based on some previous discussions with folks like @JosephTremoulet and @lt72 - before we make any changes that will affect the workflow of those using this repo, I want to make sure we have a consensus on what's desirable. Do you have a suggestion about the right set of people gather feedback from?

I agree in general that it's been challenging to provide support on running with 'latest' dotnet - but often the questions I get are related to consuming new bits faster.

Also, we already have the ability to use any compatible version of dotnet easily, so this is largely a discussion about the defaults.


Anything to make CreateStore faster seems like a good idea to me.


Due to an (unforetunate) design choice, the dotnet.exe in the newly installed .dotnet directory wasn't actually isolated. It would scan and load files from the global dotnet directory if the global directory had more recent versions. This behavior could easily contribute to non-deterministic issues when running the scenario on different machines that have different content installed in the global location. Thankfully there is an environment variable DOTNET_MULTILEVEL_LOOKUP that can be used to opt-out of that behavior.

Oh wow, I didn't know about this. Yeah we definitely need that fix.

JosephTremoulet commented 6 years ago

based on some previous discussions with folks like @JosephTremoulet and @lt72 ... Do you have a suggestion about the right set of people gather feedback from?

I was about to suggest @AndyAyersMS and @jorive, but I see they're already @mentioned on this thread. Maybe also @russkeldorph.

AndyAyersMS commented 6 years ago

@noahfalk these all seem like reasonable changes to me. I had noticed that the store creation took forever but didn't have the sense to go look deeper.

The CoreCLR perf automation is pulling from the dev branch and it seems like some of these changes should be added there too.

Or do you think we should switch the CoreCLR automation to use release/2.0 so there are fewer moving parts?

@jorive it might be a good idea to validate from ETL that the CoreCLR automation tests are loading the DLLs we expect to be testing and not some other versions (eg clrjit, coreclr). Given how frequently we churn the jit interface I would not expect mismatched part runs to get very far, but it would be better to know for sure.

jorive commented 6 years ago

@AndyAyersMS Yes, I can add a validation for it.

lt72 commented 6 years ago

In 2.0 we had a lot of issue keeping JitBench running because of temporary inconsistencies between our stack and ASP.Net. Obviously that is still happening, so I am in favor of rolling back to 2.0.

noahfalk commented 6 years ago

This was an intentional choice based on some previous discussions with folks like @JosephTremoulet and @lt72

@rynowak - I am surprised to hear that some folks wanted newer BCL/SDK bits faster given that I think everyone using the repo is working on coreclr or clrjit, but if that is the case then I would suggest we parameterize the .Net version in the same way ASP.Net's version is parameterized. You mentioned @lt72 and @JosephTremoulet. Looking above @lt72 agreed to roll back to 2.0.0 and @JosephTremoulet I wasn't sure what your position was?

Or do you think we should switch the CoreCLR automation to use release/2.0 so there are fewer moving parts?

@AndyAyersMS - Yeah I do think we should do this. Our test script already grabs the most recent versions of the binaries that are built from CoreCLR's repo so allowing anything else to float seems like it is just noise. I'd propose we periodically move our pinned version forward, but we do it at a time of our choosing.

JosephTremoulet commented 6 years ago

We were specifically interested in coordinating work going on in the JIT (in coreclr) and in corefx. Maybe that's not important to anybody anymore, I don't know, I don't still have a pressing need for it. But of course it's useful to test against pinned versions, which is what I thought the branches were for - dev was "latest everything", which breaks relatively often but allows the closest coordination across repos when that's desired, other branches were for pinned versions. It was my impression that the version pinning applied to coreclr and corefx and asp.

noahfalk commented 6 years ago

Thanks @JosephTremoulet. It sounds like there are no objections to fully version pinning the rel/2.0.0 branch. I'm happy to leave the dev branch unpinned for those who want to brave daily builds of the full stack. With a bit more testing I did find 1 breaking change two weeks ago that prevents a current System.Private.CoreLib.dll from working properly with the 2.0.0 RTM CoreFX stack. I'm aiming to revert it (https://github.com/dotnet/coreclr/commit/9d228c85464de2bfe68d8d86a3794112d7bb54fb) but you can easily edit it locally if you need a temporary workaround.

I also filed a variety of relevant issues: https://github.com/dotnet/core-setup/issues/3360 - dotnet probes the global store location regardless of DOTNET_MULTILEVEL_LOOKUP https://github.com/dotnet/core-setup/issues/3361 - in general its hard to make a private install given the current design https://github.com/dotnet/sdk/issues/1699 - dotnet store resolves both coreclr.dll and clrjit.dll from the NuGet cache, not from the private dotnet install

noahfalk commented 6 years ago

I followed up with Jan Kotas and he requested that I walk back my desires a little bit. He proposed CoreCLR + CoreFX (shared runtime) would advance together and that we can use dependencies.props in the root of the CoreCLR repo to determine the corresponding version of CoreFX that works with a given CoreCLR. This would still leave SDK, CLI, and Core-Setup pinned in addition to ASP.Net that is pinned in this branch now. I'll need to do a little further work to figure out what that would look like.

noahfalk commented 6 years ago

Just fyi, I had to put this work on pause because of a desktop servicing issue that needs to take priority, if anyone else wants to run with it in the meantime feel free.

rynowak commented 6 years ago

@noahfalk - no worries. Send me a ping when you get back to it

noahfalk commented 6 years ago

Heads up, this PR is being replaced by #79. It addresses the same issues + goes a bit farther trying to make running the benchmark streamlined, reproducable and allowing a little flexibility on versioning without defaulting anything to 'latest'.