ashmind / SharpLab

.NET language playground
https://sharplab.io
BSD 2-Clause "Simplified" License
2.72k stars 200 forks source link

Add support for JIT Dasm Code generation #39

Closed davidfowl closed 7 years ago

davidfowl commented 7 years ago

This would be pretty epic https://github.com/dotnet/jitutils

/cc @benaadams @jamesqo

jamesqo commented 7 years ago

Oh yeah, this would definitely be a nice feature if it was implemented. It may require actually running the app though (currently AFAIK all the site does is compile to IL). Also there would be things to consider like which JIT should be used (.NET Core JIT, .NET Framework one, etc.)

benaadams commented 7 years ago

Very epic...

Needs to use checked built clrjit.dll then can use environment flag

set COMPlus_JitDisasm=* to output it (or a more specific namespace than wildcard)

ashmind commented 7 years ago

Thanks for raising this! I'm certainly interested, but not sure how much time I would have for research in nearest future. The following answers would help:

  1. What are the dlls for each JIT implementation (Core, .NET old, .NET RyuJIT)?
  2. Where can I find a documentation on each API, or maybe a .NET wrapper?
jamesqo commented 7 years ago

@ashmind Sorry for the delayed response, I had forgotten about this issue until recently.

  1. What are the dlls for each JIT implementation (Core, .NET old, .NET RyuJIT)?
  1. Where can I find a documentation on each API, or maybe a .NET wrapper?

The runtime is all implemented in native code, so I don't think there is a .NET wrapper. I normally get the disassembly by setting the environment variable COMPlus_JitDisasm=<methods to dump> and running the app. You can find more documentation on how to configure CLR behavior here.

Note that I'm not sure if there is a way to get the dasm without actually executing the app. That may be problematic because you don't want people running I/O functions on your site.

ashmind commented 7 years ago

@jamesqo Thanks a lot! I'll start looking at it once I merge the mirrorsharp branch.

ashmind commented 7 years ago

OK some unsorted research in the meantime.

ashmind commented 7 years ago

@jamesqo What's ZAP in the context of coreclr terminology?

jamesqo commented 7 years ago

@ashmind I am not quite sure myself, since I'm not too knowledgeable about the inner workings of the runtime. However, I searched this file with Ctrl+F for "zap" and the first result says

Assert if an assembly succeeds in binding to a native image

So I would guess it refers to the process of converting plain IL to a native image assembly.

davidfowl commented 7 years ago

/cc @jkotas @JosephTremoulet @russellhadley

JosephTremoulet commented 7 years ago

Right, I believe "zap" == crossgen/ngen in this context (crossgen is the utility that does pre-compilation to native for Core, NGen for desktop). The jit-diff utility in https://github.com/dotnet/jitutils uses crossgen to be able to generate asm for assemblies without needing to execute them. I'd wager that trying to use the jit APIs directly would amount to re-implementing crossgen, so you'd likely want to invoke jit-diff (or invoke crossgen like jit-diff does) to get the compilation to happen.

ashmind commented 7 years ago

I've researched this for a while, and I feel there is no reasonable solution for this (considering web environment and my capabilities).

Some things I considered:

  1. Explicitly forcing JIT by calling ICorJitCompiler.compileMethod. Unfortunately, this API seems to be complex, undocumented under full framework and unstable between .NET versions. So even if I implement one (high-effort) solution, there is no guarantee it would not break later.

  2. Patching the crossgen sources so that they can execute in-process. With my limited knowledge of cmake and duration of the build process, it's unlikely I can implement this in reasonable time (under a week).

  3. Hosting a separate instance of CLR within my process. I haven't found any examples of hosting .NET from .NET, and even if I did I can't think of a simple way of having COMPlus_JitDisasm enabled for hosted CLR but not the main one.

  4. Hosting a separate CLR in a separate process. This might be possible, but I'm not sure Azure Web Apps are able to handle a secondary background process -- and it would definitely add major complexity to deployment and uptime.

What I would like to see (to implement this ticket) would be an approach that would either allow me to load assemblies into "jit-dump-enabled" hosted CLR instance, or call JIT directly using a fixed well-defined interface.

Please let me know if you have some better ideas.

(I considered just running the command line app as suggested, however I don't feel that would be a scalable approach and I have no idea what the cost would be from Azure POV)

JosephTremoulet commented 7 years ago

Hoping that the profiler infrastructure could help, I asked @noahfalk about this, and, with the caveat that it would be taking advantage of implementation details that are subject to change at any time, he had this to say:

there might be some roundabout ways to do it.

If the method code could be re-emited as a DynamicMethod with restrictedSkipVisibility = true https://msdn.microsoft.com/en-us/library/bb348332(v=vs.110).aspx

And then you call CreateDelegate(), a side-effect of that is that the runtime will eagerly JIT compile the code.

Then to actually get the code... 1) If you have admin permissions you could listen for ETW events 2) If you can set an environment variable you could profile your own process with ICorProfiler APIs 3) If you can use private reflection there is a function on DynamicMethod: internal unsafe RuntimeMethodHandle GetMethodDescriptor() and on RuntimeMethodHandle: public IntPtr GetFunctionPointer()

That approach is dangerous as it might break the lifetime management of the DynamicMethod, and if you are going to use private reflection you may as well use directly:

System.Runtime.CompilerServices.RuntimeHelpers._CompileMethod

4) If he can run a custom build of coreclr he could of course hit pretty much any limitation with a hammer

I don't know if you'd consider that workable, but figured I'd pass it along...

ashmind commented 7 years ago

@JosephTremoulet Thanks a lot!

OK I think I'm getting somewhere, now I have those two questions:

  1. GetFunctionPointer() -- if the method is JIT-compiled, does this point to the actual compiled method or some interim jump/stub? Let's say it's a standard method from an assembly, not DynamicMethod.
  2. Where can I find the method byte size, so that I know how much to read from GetFunctionPointer()?
JosephTremoulet commented 7 years ago

Where can I find the method byte size, so that I know how much to read from GetFunctionPointer()?

I don't know, will think about it. Are you planning to use some library to disassemble the machine code? Maybe there are libraries that do code discovery. Or maybe you could key off the int 3 padding that I believe we emit between methods...

does this point to the actual compiled method or some interim jump/stub? Let's say it's a standard method from an assembly

I won't claim to know definitively, but I was curious enough to try an example:

using System;
using System.Reflection;
using System.Runtime.CompilerServices;

namespace N
{
    public static class C
    {
        static int Twenty() => 20;

        public static int Main(string[] args)
        {
            MethodInfo methodInfo = typeof(C).GetMethod("Twenty", BindingFlags.NonPublic | BindingFlags.Static);
            RuntimeMethodHandle handle = methodInfo.MethodHandle;
            int token = methodInfo.MetadataToken;
            Module module = methodInfo.Module;
            MethodBase resolvedMethod = module.ResolveMethod(token);
            MethodInfo compileInfo = typeof(RuntimeHelpers).GetMethod("_CompileMethod", BindingFlags.NonPublic | BindingFlags.Static);
            compileInfo.Invoke(null, new object[] { resolvedMethod });
            IntPtr addr = handle.GetFunctionPointer();
            Console.WriteLine("addr: {0}", addr);
            Console.Write("bytes:");
            for (int i = 0; i < 6; ++i)
            {
                unsafe { Console.Write("{0:X2}.", ((byte*)addr)[i]); }
            }
            Console.WriteLine();
            return 0;
        }
    }
}

Running that (without the loop at the end) with a debug jit and COMPlus_JitDisasm set to Twenty, I could see that this particular method is 6 bytes long, so hard-coded the "6" and compared the results:

; Assembly listing for method N.C:Twenty():int
; Emitting BLENDED_CODE for X64 CPU with SSE2
; optimized code
; rsp based frame
; partially interruptible
; Final local variable assignments
;
;# V00 OutArgs      [V00    ] (  1,  1   )  lclBlk ( 0) [rsp+0x00]
;
; Lcl frame size = 0

G_M27994_IG01:

G_M27994_IG02:
       B814000000           mov      eax, 20

G_M27994_IG03:
       C3                   ret

; Total bytes of code 6, prolog size 0 for method N.C:Twenty():int
; ============================================================
addr: 140715165222432
bytes:B8.14.00.00.00.C3.

So at least in this case, GetFunctionPointer() returned the JITTed native code.

noahfalk commented 7 years ago

If you ensure that method is jitted in advance, then current implementations will always give you a pointer directly to the jitted code (as Joe's experiment showed). However in the future that implementation might change so that you receive a pointer to (one or more) jmp instructions. If you want to be resilient to those changes it might worthwhile to add some logic that can disassemble a jmp instruction and follow it to its target, potentially more than once.

If you are curious the relevant code should be: https://github.com/dotnet/coreclr/blob/master/src/vm/runtimehandles.cpp#L2122 This drops you into MethodDesc::GetMultiCallableAddrOfCode which does a bit of work to determine what code pointer is most appropriate.

svick commented 7 years ago

@noahfalk Wouldn't that code also somehow need to handle the situation when the actual method starts with a jmp? For example consider this method:

static void f(int i)
{
    goto l1;

    l2:

    i++;

    l1:

    i++;

    goto l2;
}

Its disassembly is:

00007FFB733504B0  jmp         00007FFB733504B4  
00007FFB733504B2  inc         ecx  
00007FFB733504B4  inc         ecx  
00007FFB733504B6  jmp         00007FFB733504B2  
noahfalk commented 7 years ago

I don't think you are going to find code produced by the JIT which starts with a jmp, but Joe probably knows that answer better than I. I also think part of using a solution like this one will be acknowledging its a bit of hack : ) The managed API surface area in .Net isn't designed to handle this scenario.

ashmind commented 7 years ago

Thanks all for the info, and here's a prototype: https://tryroslyn.azurewebsites.net/#f:>asmr/M4FwhiCWDGAE0BszGLAwrA3gWAFCwNlAhlkgDsRYBZACgqoA8BKWAXgD5ZHYBqWAAyMAjACYAzABYArAG48AXyA=

I did my own end-of-code guessing, so please let me know of any issues. Note: this will not be available in other branches (non-NuGet) until tomorrow or later.

noahfalk commented 7 years ago

Very slick! To take it the next level we'd probably need to add annotations to the jitted code, such as symbolic names for the various addresses which show up. If its something you are interested in I can ponder if there is a good way to do it.

davidfowl commented 7 years ago

@ashmind yes!!! This is great!

@noahfalk that would be amazing. I can see us using this tool ourselves 😄

davidfowl commented 7 years ago

@ashmind Found a bug:

using System;
using System.Threading.Tasks;
using System.Runtime.CompilerServices;

static class C {
    static int M(int x) 
    { 
        return Foo(x + 0x12345).Result;
    }

    // [MethodImpl(MethodImplOptions.NoInlining)]
    static async Task<int> Foo(int x)
    {
        return x;
    }
}

Exception:

System.IndexOutOfRangeException: Index was outside the bounds of the array.
   at SharpDisasm.Udis86.Decode.decode_vex(ud& u)
   at SharpDisasm.Udis86.Decode.decode_ext(ud& u, UInt16 ptr)
   at SharpDisasm.Udis86.Decode.decode_opcode(ud& u)
   at SharpDisasm.Udis86.Decode.ud_decode(ud& u)
   at SharpDisasm.Udis86.udis86.ud_disassemble(ud& u)
   at SharpDisasm.Disassembler.NextInstruction()
   at SharpDisasm.Disassembler.<Disassemble>d__0.MoveNext()
   at TryRoslyn.Server.Decompilation.JitAsmDecompiler.DisassembleAndWrite(JitCompiledMethod method, Translator translator, TextWriter writer)
   at TryRoslyn.Server.Decompilation.JitAsmDecompiler.Decompile(Stream assemblyStream, TextWriter codeWriter)
   at TryRoslyn.Server.MirrorSharp.SlowUpdate.<ProcessAsync>d__3.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at MirrorSharp.Internal.Handlers.SlowUpdateHandler.<ExecuteAsync>d__4.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter.ValidateEnd(Task task)
   at MirrorSharp.Internal.Connection.<ReceiveAndProcessInternalAsync>d__13.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at MirrorSharp.Internal.Connection.<ReceiveAndProcessAsync>d__12.MoveNext()
ashmind commented 7 years ago

@noahfalk

symbolic names for the various addresses which show up

That sounds good -- should be easy within assembly, but let me know if you can think of an easy way to do that with .NET Framework calls.

@davidfowl Thanks for the report! I'll take a look, but can't promise a quick fix if it's inside SharpDisasm.

davidfowl commented 7 years ago

NP. I'm just excited I can play with this.

Also, this looks bizzare. @JosephTremoulet can you confirm that this looks correct?

https://tryroslyn.azurewebsites.net/#f:>asmr/K4Zwlgdg5gBAygTxAFwKYFsDcBYAUKSWRFDAOgGEB7AG2tQGNkxKIRSBxVCVAJzHpz5w0eEjTpSAFQAWPVAEMAJoSnyQAaxCCCI4uNIAlYBCbpUFSugAOYOjzi8Abv1Ra8eFPKb0Y9ampAYchgAbzwYCJhPbxhIZBgAWQAKOIBtAF0YAA8AShhwyJD83EjSmDlkYB4IbNSABnTBUoBfPGagA

ashmind commented 7 years ago

@davidfowl I think I'm not detecting end-of-code correctly there -- looking at int3 mentioned by @JosephTremoulet. I would like to understand what does call 0x5ed030b0 do and why there is no ret after it. Some kind of tail call?

Anyway I think I can upgrade my detection to stop on int3 as well.

davidfowl commented 7 years ago

https://tryroslyn.azurewebsites.net/#f:>asmr/K4Zwlgdg5gBAygTxAFwKYFsDcBYAUKSWRFDAOgGEB7AG2tQGNkxKIRSBxVCVAJzHpz5w0eEjTpSAFQAWPVAEMAJoSnyQAaxCCCI4uNIAlYBCbpUFSugAOYOjzi8Abv1Ra8eFPKb0Y9ampAYchgAbzwYCJhPbxhIZBgAWQAKABkwFAAeOIA+GAAPAEoYcMiQ4txIypg5ZGAeCHyAbQAGAF1BSoBfPE6gA

This one has a ret and an int3 and it seems like both are ignored.

ashmind commented 7 years ago

@davidfowl Yep -- I ignored ret by design as jbe 0x28 can bypass it (otherwise multi-ret methods don't work), but I'll fix int3 soon.

davidfowl commented 7 years ago

👍

nietras commented 7 years ago

This is awesome! 👍 I filed an issue for something similar in BenchmarkDotnet https://github.com/dotnet/BenchmarkDotNet/issues/437 so we can have a disassembly diagnoser there.

omariom commented 7 years ago

If it is still needed, RuntimeHelpers.PrepareMethod and RuntimeHelpers.PrepareDelegate force jit compilation.

ashmind commented 7 years ago

Just FYI I now get why int3 is there, it's basically a throw, in case of array probably a bounds check.

@davidfowl Somehow I can't reproduce that IndexOutOfRangeException locally -- I'll split it into a separate ticket for now.

There is also some cases which generate pop es on my local, which is invalid under x64 so there is a bug either in my code or SharpDisasm, but I can't get a minimal repro yet.

kumpera commented 7 years ago

The JitView cannot handle generics. I tried this function static T Id<T> (T t) => t; and it crashes with this:

System.ArgumentException: The given generic instantiation was invalid.

Server stack trace: 
   at System.Runtime.CompilerServices.RuntimeHelpers._PrepareMethod(IRuntimeMethodInfo method, IntPtr* pInstantiation, Int32 cInstantiation)
....
svick commented 7 years ago

@kumpera Well, what should it show? The disassembly for different instantiations of a generic method/type can have different disassembly.

JosephTremoulet commented 7 years ago

It might make sense to define a custom attribute that could be applied to a generic method to specify what instantiations to generate disassembly for.

noahfalk commented 7 years ago

@omariom -

If it is still needed, RuntimeHelpers.PrepareMethod and RuntimeHelpers.PrepareDelegate force jit compilation.

As best I can tell that is full CLR only behavior. On CoreCLR https://github.com/dotnet/coreclr/blob/68f72dd2587c3365a9fe74d1991f93612c3bc62a/src/mscorlib/src/System/Runtime/CompilerServices/RuntimeHelpers.cs#L90

noahfalk commented 7 years ago

symbolic names for the various addresses which show up

That sounds good -- should be easy within assembly, but let me know if you can think of an easy way to do that with .NET Framework calls.

I'm not convinced it would be easy, but it might be doable. So far the only thing I've come up with would probably involve leveraging CLRMD: https://github.com/Microsoft/clrmd/blob/master/src/Microsoft.Diagnostics.Runtime/ClrRuntime.cs#L162

In order for CLRMD to work it needs to have access to some state of the process being inspected: https://github.com/Microsoft/clrmd/blob/master/src/Microsoft.Diagnostics.Runtime/datatarget.cs#L694

And you could probably implement that interface using p/invokes to OS APIs such as ReadProcessMemory https://msdn.microsoft.com/en-us/library/windows/desktop/ms680553(v=vs.85).aspx

I'm not sure anyone has ever attempted it doing live reads from their own process, so there is a little bit finger crossing involved that the memory necessary for the algorithm that maps IP -> method is stable across time as CLRMD is executing within the process. The mapping itself is stable (at least on current implementations), but for example the hash tables and range trees that record this mapping might be changing if CLRMD is jitting more code into the process as it runs. You might be able to call it once with some dummy data to get everything jitted and then hope from that point onwards no further jitting will be necessary. If that doesn't work out you could get a little more aggressive and capture a dump or process snapshot of your own process then read memory from there.

All of this probably gets simpler and more reliable if you can spin off a child process though. In that case you could either call crossgen and then examine the resulting binary+PDB, or you could do your jitting in a child .Net process and make the main process debug or profile the child process. It would seem a bit odd if your hosting environment let you p/invoke to APIs like ReadProcessMemory but denied you the ability to call CreateProcess.

ashmind commented 7 years ago

To simplify further process, I'll split remaining bugs/enhancements requests into separate tickets and close this one.

  1. I fixed crash on open generics, if you want an attribute to JIT them with specific type args, see #80.
  2. For symbolic names (@noahfalk) see #81.
  3. Crash reported by @davidfowl is at #79.
ashmind commented 7 years ago

@omariom, @noahfalk I'll use PrepareMethod for now as I'm on old-school .NET, but thanks for the note -- I'll update it once I move to Core.

ashmind commented 7 years ago

Thanks!

ashmind commented 7 years ago

One more thing in case someone wants to implement something similar -- you can actually use CLRMD (suggested by @noahfalk) with DataTarget.AttachToProcess(current process id, ..., AttachFlag.Passive) to get method size from ClrMethod.HotColdInfo.HotSize and avoid guessing based on ret/int3.

Still resolving few small issues around that, but seems way more reliable than guesswork.

JosephTremoulet commented 7 years ago

Oh, great. Looks like HotColdInfo has the start addresses too. If you're not already, you'll want to disassemble both hot and cold regions of each method.