ashmind / SharpLab

.NET language playground
https://sharplab.io
BSD 2-Clause "Simplified" License
2.69k stars 197 forks source link

Add notice to JIT Asm tab untill Tiered Compilation (#425) is resovled #1251

Open OwnageIsMagic opened 1 year ago

OwnageIsMagic commented 1 year ago

It's not the only one (Ttier0/1/r2r/aot) possible representation and it quite misleading about what code would be really executing in real world scenarious

EgorBo commented 1 year ago

@OwnageIsMagic this is an optimized codegen, the nop paddings are to align loop to some specific boundary (e.g. 16bytes)

but still emits ok branch on always throw case

JIT doesn't tend to optimize obviously faulty code like accessing arr[6] while array has only 5 elements

tannergooding commented 1 year ago

Tier0 output is garbage

Tier 0 isn't "garbage", it defaults to debug quality code which means most optimizations are not performed. There is work being done to allow minimal optimizations in some contexts, particularly where that improves JIT throughput, but that will in general still be "worse" than properly optimized code.

but some people could reason about performance of .NET platform based on this output.

You cannot and should not reason about performance simply based on assembly output. CPUs are incredibly complex and sometimes non-obvious things are fast. For example, inserting nop padding to align the hot path of you code will typically improve performance.

If you want to know how fast a piece of code is, you have to measure it. Simple looking code can be slow; complex looking code can be fast; and vice versa as well. It's always case by case and scenario by scenario (e.g. sometimes vectorization slows things down, other times it speeds it up, really depends on many complex factors).

;; dead stores

The memory model is quite complex and not everything is strictly allowed to be optimized out: https://github.com/dotnet/runtime/blob/main/docs/design/specs/Memory-model.md

While the JIT could prove that there is single-threaded consistency here and potentially elide the unnecessary stores; the code itself is questionable and unlikely to be encountered in the real world and so its likely not worth the additional complexity in the JIT to optimize it.

;; funny that if we write arr[4]; Tier0 eliminates this throw and compassion,

Exceptions are considered exceptional and they are not typically used to optimize out other code.

Likewise since an exception will be thrown here, the arr[i] = 1 is not necessarily dead. In a more real world/representative scenario where the array is external to the method, the write would be visible, particularly if the exception was caught.


In general, the JIT optimizes well-formed and expected code. The JIT typically doesn't add optimizations to accelerate failure cases, which cover unlikely to be encountered code, or to make micro-benchmarks "look good".

Instead, optimizations in the JIT focus on scenarios that are likely to be encountered via other optimizations such as inlining or constant folding and those that have been shown to have measurable real world impact.

OwnageIsMagic commented 1 year ago

Hello, @EgorBo @tannergooding you are taking this out of context. This issue is filled against ashmind/SharpLab, not dotnet/runtime and it's about what you see in JIT Asm panel isn't what you probably wanted to see. I can't carry out any insight of this asm listing, beside that is far from optimal. It's not the code that runs in my benchmarks, it's not the code that I (and probably anyone) should care of (for one shot executions the difference between opt/non-opt is below thread scheduler jitter margin). And it's not obvious at all. The only thing I can check here is correctness, but in that case I should also check Tier 1 output and it's not available.

align loop

It's does not seem profitable here since the whole loop fits in first 32 bytes of function and functions are aligned. And if I replace arr[i] = 1 with arr[i] = 0 or arr[i] = i there is no padding.

;; dead stores

sorry for this, spent too much time recently debugging UB.

EgorBo commented 1 year ago

but in that case I should also check Tier 1 output and it's not available.

Well, that is Tier1 (FullOpts to be precise) codegen, and not Tier0 like you stated in the issue.

It's does not seem profitable here since the whole loop fits in first 32 bytes of function and functions are aligned. And if I replace arr[i] = 1 with arr[i] = 0 or arr[i] = i there is no padding.

Loop alignment is based on a set of heuristics, feel free to file CQ issues in dotnet/runtime if you notice a case where they do a wrong thing.

and functions are aligned

I don't think we align functions to 32 bytes boundary, afair it's 16b but I can be wrong

OwnageIsMagic commented 1 year ago

I don't think we align functions to 32 bytes boundary, afair it's 16b but I can be wrong

In .NET 5, we started aligning methods at 32B boundary.

I doubt it's actually FullOpts, or maybe I'm still assuming too much for JIT compiler https://sharplab.io/#v2:C4LghgzgtgPgAgJgAQA...

class P
{
    readonly static int Test;
                              // P.M()
    static int M()            //     L0000: mov ecx, 0x18a1c610
    {                         //     L0005: xor edx, edx
        _ = Test;             //     L0007: call 0x725c8aa0  ;; call to class initializer
        return 1;             //     L000c: mov eax, [eax+4] ;; dead store
    }                         //     L000f: mov eax, 1
}                             //     L0014: ret

I reworded OP, sorry if it was too harsh.

ashmind commented 1 year ago

Thanks for reporting -- this certainly makes sense, I'll look to add a notice (no timeline promise).

EgorBo commented 1 year ago

I doubt it's actually FullOpts, or maybe I'm still assuming too much for JIT compiler https://sharplab.io/#v2:C4LghgzgtgPgAgJgAQA...

@OwnageIsMagic It is FullOpts. Afair, sharplab uses DynamicMethod to produce codegen, hence it needs a lazy static initializer before first use e.g.:

image

Unfortunately, for dynamic context we don't support tiered compilation so you can't rely on tier1 to drop it + tiering needs actual execution + warm up -- something you don't want to have on sharplab becuase it will increase time it takes to see codegen.

So as the result, all static fields lead to such calls in codegen on sharplab, e.g.: https://sharplab.io/#v2:C4LghgzgtgPgxAOwK4BsVgEYoKYAIAmAlhJjgLABQlAAgIwBsu1ATLgAqUDeluvTDuQgmC4AKtgjAA3JR586jISICyACgCUuALwA+MROmUAvkA==