Update optimization flags/levels in jit interface

JosephTremoulet commented 7 years ago

With AOT scenarios now including ngen and crossgen and ready-to-run, desktop and coreclr and corert, and with work in flight on IBC and tiered jitting, it's a good time to take another look at the flags used to control which optimizations the jit performs, and try to get a set in place that will give us the right interface going forward to evolve our codegen/policies in the different scenarios.

category:design theme:jit-ee-interface skill-level:beginner cost:small impact:small

JosephTremoulet commented 7 years ago

The current setup works like so:

When the EE asks the JIT to compile a method, it passes CorJitFlags, which include independently-settable:
- CORJIT_FLAG_SPEED_OPT
- CORJIT_FLAG_SIZE_OPT
- CORJIT_FLAG_MIN_OPT "disable optimization"
- CORJIT_FLAG_DEBUG_CODE "no code mangling"
The JIT translates these to JitFlags; the mapping is 1:1
The JIT then translates these into three independent things:
- "debuggable code" vs not
- Which optimizations to run. We set this to CFLG_MINOPT, which is defined as CFLG_TREETRANS, if MIN_OPT was requested or debuggable code was requested or we're jitting a cctor and the cctor isn't an inlinee; otherwise, we set this to CFLG_MAXOPT, which is defined as all of them:
  - CLFLG_REGVAR
  - CLFLG_RNGCHKOPT
  - CLFLG_DEADASGN
  - CLFLG_CODEMOTION
  - CLFLG_QMARK
  - CLFLG_TREETRANS
  - CLFLG_INLINING
  - CLFLG_CONSTANTFOLD
  - CLFLG_STRUCTPROMOTE
- Whether to favor size, speed, or "blended" (setting both flags would just result in optimizing for size, and cctors are forced to always optimize for side)

JosephTremoulet commented 7 years ago

Pulling from the list of flags today and some of the discussion around tiered jitting, we know that the following are some of the goals that we may have for code to be compiled:

Make sure stepping through code in the debugger will work well (we have this today in the "debuggable code" flag)
Make sure the jit is using the most straightforward/bulletproof lowering (we have this today in minopts)
Make sure the jit is going to spend as little time as possible jitting (we don't have this today, and will want this for fjit)
Make sure the jit generates code that will run as fast as possible (we don't have this today, and will want this for re-jitting hot kernels and/or build-lab and/or IBC)
Make the jit try to strike the best balance it can for single-shot scenario (this is the main scenario today)
Make sure the jit is sensitive to code size (we have this today in favor size / favor speed / blended)

Questions to consider: Have I left some out? Will the picture change if we get to the point that tiered compilation allows on-stack replacement and we can perform speculative optimizations, or would the above goals just combine orthogonally with a set of allowable speculative assumptions?

JosephTremoulet commented 7 years ago

If the above list is sufficient, it seems to me like it could be represented with eight distinct states:

1: Use straightforward lowering that is bulletproof and facilitates stepping through in the debugger 2: Optimize for high throughput 3 - 8: The six combinations of:

Optimization "level" (default / high-CQ)
Size/speed sensitivity (size-sensitive / blend / size-insensitive)

BruceForstall commented 7 years ago

Another design goal is generally simplicity: reduce the number of combinations that must be tested.

nit: "debuggable" is more than just "facilitates stepping through in the debugger", it also includes accurate variable value access (extend variable lifetimes, make sure debugger knows where the variables live). We also should improve debuggability in the presence of optimization to support live-site attach or dump debugging.

JosephTremoulet commented 7 years ago

We also should improve debuggability in the presence of optimization to support live-site attach or dump debugging.

Good point; sometimes we'll want to change what optimizations we run to improve debuggability (item 1 in my list above), but sometimes we'll want the same optimizations but also to generate debug info as well as we can, so ISTM there should also be a "generate debug info" flag in the interface that's orthogonal to the optimization flags.

mikedn commented 7 years ago

Make sure the jit is using the most straightforward/bulletproof lowering (we have this today in minopts)

Small observation - there's stuff going in lowering that is certainly not required but is done without regard to minopts, magic division for example. It behaved like this before I moved the code around and I always wondered why it doesn't bail out in minopts mode.

noahfalk commented 7 years ago

there should also be a "generate debug info" flag in the interface that's orthogonal to the optimization flags.

Is there any scenario in which we don't want debug info? I assume that nearly every scenario we have is of the form "I want to perform 0 or more optimizations that might degrade the debugging experience, but aside from forced losses in optimization, give me the best debugging experience that remains possible"

I'd guess that a flag for 'generate debug info' will always be true, in which case we could save some complexity/test time by eliminating it as a free variable.

jkotas commented 7 years ago

The flag for 'generate debug info' exists already (CORJIT_FLAG_DEBUG_INFO), and it is always set by the VM for the reasons that @noahfalk mentioned.

JosephTremoulet commented 7 years ago

Is there any scenario in which we don't want debug info? ... I'd guess that a flag for 'generate debug info' will always be true The flag for 'generate debug info' exists already (CORJIT_FLAG_DEBUG_INFO), and it is always set by the VM

Interesting... I was mainly assuming we'd want this because we added it already, and (IIUC) both cl and csc make generation of PDBs (which can be quite large, especially with optimized code) optional. But if we don't think it's worth it for e.g. footprint-constrained deployments, or carry-over from .Net Native (what does it do in this regard?), then "we could save some complexity/test time by eliminating it as a free variable" sounds fine to me.

JosephTremoulet commented 7 years ago

I'm also wondering if we should have a dimension for signalling tiered jitting vs single-shot jitting vs "build lab" (by which I mean AOT compilation with some sort of opt-in for more throughput-intensive optimizations)... it's entirely conceivable, for example, that the 2nd round of tiered jitting would want to be a bit more aggressive than single-shot jitting is, or that some AOT scenarios want to be more aggressive still but not to the point of wanting to "turn the dial to 11" like tiered jitting of very hot kernels may want... my inclination is to go with @BruceForstall's point that "Another design goal is generally simplicity: reduce the number of combinations that must be tested" and avoid adding a dimension like that until/unless we have concrete plans to use it for something, so have the "first stab" use the matrix outlined above, but I'm curious what others think.

jkotas commented 7 years ago

We had knobs for controlling whether or not to generate debug info in .NET Framework 1.0. It made things like attaching a debugger to existing process or dump debugging work poorly. We turned it on by default in .NET Framework 2.0, together with some work to minimize the debuginfo over head, so that these things just work. The key value prop of .NET is that things work pretty well by default (and you are willing to pay a bit for it) and not having to think about tough choices like not being able to debug vs. having big native PDB around.

I think it would be fine to have a switch that disables generation of debug info for measurements and experiments; but the mainstream .NET experience should be debuggable by default.

.Net Native (what does it do in this regard?)

.Net Native is on the big PDB plan only today. We got feedback that it is not what people expect from .NET. There is work to make it better, e.g. make Environment.StackTrace work well even without the big PDB around.

JosephTremoulet commented 7 years ago

Per discussion in dotnet/coreclr#10580, we'll adopt the goal of consolidating the discretionary policy decision-making on the JIT side of the interface, which means that the interface is really about communicating context to the jit.

To that end, dotnet/coreclr#10580 will add flags to identify tier 0 and tier 1 compilation requests (and distinguish them from single-shot compilation requests).

Currently, the VM sets the MINOPT flag if the NoOptimization flag is set in the method's metadata, or if a COMPLUS variable requests it. Presumably the JIT could make those checks, but on the other hand it's probably good to leave a way for the VM to force "bulletproof low-risk" compilation in the interface.

AFAICT, we aren't really using the SIZE_OPT and SPEED_OPT flags (they seem to only be set in response to COMPLUS variables), and could remove them.

Presumably we'll want to add a "compile the code as quickly as possible" flag on the jit side, using it for tier 0 and maybe cctors.

dotnet / runtime

Update optimization flags/levels in jit interface #7751