dnSpyEx / dnSpy

Unofficial revival of the well known .NET debugger and assembly editor, dnSpy
GNU General Public License v3.0
7.01k stars 459 forks source link

Add JIT event signalling and --break arg for attach #303

Open mitchcapper opened 8 months ago

mitchcapper commented 8 months ago

At some point someone thought of using dnSpy as a JIT debugger based on the presence of clear CLI args for such purpose (with two not currently even used).

This PR improves the postmortem JIT debugging for dnSpy as a target. The primary thing is properly signalling the DebugEvent handle to let Windows know we are good to continue. It does this right when it attaches (well right before technically) which is the same behavior of x64dbg.

This also implements a --break CLI arg to allow automatic breaking on attach. This is to improve the JIT debugging experience for situations like where the user may have the Ignore Debugger.Break() / and break instructions option enabled (but also has other uses). While it is easy to understand why someone might want that option on, they likely do want to pause when dnSpy is acting as JIT and attaches. Another option would be to temporarily disable those break ignores when JIT debugging is started, attach (and then it will break) and then re-enable them for the user.

Now the first reason this is done as a draft, I am not happy with how I implemented this. dnSpy is toooo fast at executing the suspend and breaks before it even has a chance to load the modules/callstack/etc. I currently use a double flagged event driven approach for when to break which is reliable (aka not too early) but has this problem. I am not sure if there is a better event to hang off. I would prefer to not just add a Task.Delay to be able to break as soon as possible.

The second reason is the singular dnSpy instance default preference. Reusing the dnSpy window is great to allow, BUT clearly we have no easy way to signal the DebugEvent handle when we only received the attach info by windows message. Now the experience is currently not very degraded as Windows automatically assumes we are done/ready when the JIT debugger called exits. The downside is this could (will?) happen before we are fully attached, and might result in the process actually terminating before we can attach. In my testing though it did seem to work as it was reliably though. I also don't filter the event args out to not pass an invalid handle to try closing, which may be worth doing rather than blind passing them. Two potential options here to improve would be: 1) Add a small sleep to the single instance handler prior to exiting (at least if the JIT event arg is passed). This would greatly decrease any chance of it exiting too fast. The second option would be to wait and actually detect when the debugger attaches to the other process before signaling/exiting and then exit. This would be accompanied by a timeout (much like my pause on attach feature) so it isn't waiting forever.

This does not implement the use of the JIT_DEBUG_INFO pointer to get that information. Clearly we use much the same but I don't think the debugging experience is overly different without it.

I am using this with JitMagic but if you wanted to use it directly as the JIT debugger you would set the AeDebug registry entries to: c:\Program Files\dnSpy\dnSpy.exe --dont-load-files --multiple -p %ld -e %ld --jdinfo %p (using the x32 dnSpy for the WoW entry). We could add a dnSpy option if desired to automatically register/unregister dnSpy as the system debugger.

Unrelated but if you have any feedback on what items are desired(accepted) vs not in https://github.com/dnSpyEx/dnSpy/discussions/115 I will likely work on a few as time permits. Some are a bit more work to implement though so if they are likely not desired to be in dnSpy I may not do them.

mitchcapper commented 8 months ago

Hrm, I just noticed one other adverse behavior in the current dnSpy that is related. If you have allow more than one instance enabled there is no CLI arg to be able to force singular/reuse of dnSpy. Not 100% related to this PR but if you let me know how (if) you want that implemented I can make a PR for that as well.

My initial thought is to make the arg nullable then add a --single flag to counter the --multiple and only follow the users preference when it is null.

mitchcapper commented 8 months ago

While I followed x64dbg's AeDebug logic of signalling right before attaching I think it is probably best to wait until after we are attached to signal, will adjust with w/e other changes.

mitchcapper commented 8 months ago

OK with some more testing I have found the best way to do this is to just set e.Pause the very first time MessageThreadCreated is called. This gives you a callstack (although may be native only transition depending on the thread) but full ability to access loaded modules, set breakpoints, etc.

This is actually a good time to send the signalling event as well that we are ready. We still need to wait about 100ms after the first created thread before sending the signal or it will still be too early.

The event signaled too early causes an issue because the program resumes and steps past the exception/debug break. There is no downside to slightly delaying this signal I have seen so could even delay slightly longer than that. There may be some event that we could hang off that would be perfect to fire it right away but not sure that is needed.

The MessageThreadCreated seems like the best event any of the other message events at least are too early (so stack/modules not loaded or in case of the module load event cannot step even though breakpoints can be set).

Even when debug breaks are set to ignore in options this does allow you to break before the next managed code call is made, so just as good as if you do allow Debug.breaks.

if you pass --break and have Debug.breaks enabled you will get two breaks (as that event fires before the official debug.break does). If we wanted to avoid this we could probably assume if the option is enabled then we can silently ignore --break as it should break on its own. I don't think it is a big deal to have a double break though.

Given the lack of a real negative effect to delaying the event handle signal I would recommend for the multi-instance solution the easiest solution would be just a sleep like: image

ElektroKill commented 8 months ago

Hi,

Regarding the approaches to waiting for things to happen, instead of awaiting all those tasks, couldn't we just subscribe to one of the events in DbgManager like MessageProcessCreated, MessageRuntimeCreated which should be fired relatively early when attaching to a process?

As for the handling of the debug event with a single instance, instead of using Thread.Sleep couldn't we just use a similar way of IPC using SendMessage to notify the process when it is safe to exit?

mitchcapper commented 8 months ago

Ok so the two questions are when to send the event and when to break.

For sending the event

Did a good bit more testing. Sometimes using the MessageThreadCreated event works but there is a good bit of time that 0 events are fired other than IsDebuggingChanged and IsRunningChanged until the event is sent.

Sending the event too early means the process resumes too soon and we can't catch it as fast.

There is in theory no delay too long to send the event as long as we do send the event. Without the event sent we miss about 15-40 ticks or so.

I would say a sleep of 250 ms seems perfectly reliable, 1/2 a second should be sufficient but I assume could vary from machine to machine.

For when to break

MessageThreadCreated seems hands down the best time to do so. All modules are loaded and breakpoints can be set. The only downside is we can't quite access all the threads. Any of the sooner events can result in the modules not all (or sometimes any) loading and an inability to step.

Double Break issue

This is the issue where if you use --break and are not ignoring debugger break calls in the option you will get two breaks. This can be especially confusing as there is a not-uncommon occurrence where the TheadCreated break will show the exact same position as the actual break cause (so hitting resume and the current line doesn't actually advance). Do we just silently ignore --break if the debug event handler is passed and the dnSpy option for ignore debugger breaks is not on?

As for the handling of the debug event with a single instance, instead of using Thread.Sleep couldn't we just use a similar way of IPC using SendMessage to notify the process when it is safe to exit?

Yes but we will have to spin up a window/message pump to receive them (right now if there is another instance we Environment.Exit out of App.xaml before even launching the primary). We could have the caller pass its PID to the singular instance along with the CLI args, once attached the primary instance could simply kill off the dupe's PID (triggering the event signal).

ElektroKill commented 8 months ago

Did a good bit more testing. Sometimes using the MessageThreadCreated event works but there is a good bit of time that 0 events are fired other than IsDebuggingChanged and IsRunningChanged until the event is sent.

Sending the event too early means the process resumes too soon and we can't catch it as fast.

There is in theory no delay too long to send the event as long as we do send the event. Without the event sent we miss about 15-40 ticks or so.

I would say a sleep of 250 ms seems perfectly reliable, 1/2 a second should be sufficient but I assume could vary from machine to machine.

I think I'm not getting something here, when dnSpy is launched with the arguments for debugging, the target process is suspended and its only resumed when the event handle passed to it is set right?

At that is how its documented: Event Handle duplicated into the postmortem debugger process. If the postmortem debugger signals the event, WER will continue the target process without waiting for the postmortem debugger to terminate. The event should only be signaled if the issue has been resolved. If the postmortem debugger terminates without signaling the event, WER continues the collection of information about the target processes. Source: https://learn.microsoft.com/en-us/windows-hardware/drivers/debugger/enabling-postmortem-debugging

If the process is suspended when dnSpy is started and attaches, then it should be no problem to set the event to resume the process at MessageProcessCreated since at that point we are sure that CorDebug is communicating with the runtime of the target process. If CorDebug is able to attach to that suspended process then setting that event a bit later shouldn't be a problem.

That is of course if CorDebug is unable to attach to this suspended process (this could be the case or not). In this case, if we need to resume the process by setting this event handle before the attach the only sane approach is to just do it before the call to DbgManager.Start. There is no reliable way to ensure no code is running before the resumed process is attached to by CorDebug. Messing around with the IsDebuggingChanged or IsRunningChanged might get us a couple of milliseconds closer but it will be far from precise and it's a hacky solution at best.

Please let me know which of these cases we are dealing with when it comes to the event handle as it will help me in reviewing and giving suggestions for this PR.

MessageThreadCreated seems hands down the best time to do so. All modules are loaded and breakpoints can be set. The only downside is we can't quite access all the threads. Any of the sooner events can result in the modules not all (or sometimes any) loading and an inability to step.

Break-on-attach functionality could be implemented for this command line handling as well as for the UI since the CorDebug debugger implementation is actually aware when it has finished the attach procedure. This information is not passed down to the user-accessible DbgManager class, however. A proper implementation would mean adding a similar event or debugger message to the user-accessible API and utilizing that. See: https://github.com/dnSpyEx/dnSpy/blob/9b0621d1f68a6d212b30150a603cae2815517869/Extensions/dnSpy.Debugger/dnSpy.Debugger.DotNet.CorDebug/dndbg/Engine/DnDebugger.cs#L740

This is the issue where if you use --break and are not ignoring debugger break calls in the option you will get two breaks. This can be especially confusing as there is a not-uncommon occurrence where the TheadCreated break will show the exact same position as the actual break cause (so hitting resume and the current line doesn't actually advance). Do we just silently ignore --break if the debug event handler is passed and the dnSpy option for ignore debugger breaks is not on?

This should only happen when the code contains an explicit call to a Debugger.Break() or similar and the break on attach breaks at that specific IL offset right? In that case, I think, we should just keep the double break in my opinion since the reasons for the breaks are different. Is there another case that I am missing that has occurred in your testing?

Yes but we will have to spin up a window/message pump to receive them (right now if there is another instance we Environment.Exit out of App.xaml before even launching the primary). We could have the caller pass its PID to the singular instance along with the CLI args, once attached the primary instance could simply kill off the dupe's PID (triggering the event signal).

Another possibility of signaling would be to use event handles created by CreateEvent passed down as arguments and then use SetEvent and WaitForSingleObject to set the event and wait for it to be set.

mitchcapper commented 8 months ago

I think I'm not getting something here, when dnSpy is launched with the arguments for debugging, the target process is suspended and its only resumed when the event handle passed to it is set right? the only sane approach is to just do it before the call to DbgManager.Start the IsDebuggingChanged or IsRunningChanged might get us a couple of milliseconds closer but it will be far from precise and it's a hacky solution at best.

In situations where the debugger can't really attach until the event there are 0 message events that fire (including MessageProcessCreated or MessageRuntimeCreated) prior to handle wait.

I don't think we are in a race condition with the delay before event send (assuming the delay is large enough). I have never seen it miss a single instruction in this setup. I assume the debugger tries to attach as far as it can, and then as long as our delay is longer than that you are not missing anything. Essentially even if you waited 5 minutes in that delay you are going to still be fine.

Still, even if not a precision issue, the delay is a bit hacky. Sending the event at the correct time instead (sounds like before Start) does sound butter. Downside is you are now burying the code to send the event (or communicate back to the original process for multi-instance) deeper. maybe the best would be another event handler? RightBeforeDebuggingStarted hahaha.

A proper implementation would mean adding a similar event or debugger message to the user-accessible API and utilizing that. OnAttachComplete?.Invoke(this, EventArgs.Empty);

Sure although on attach complete may not be clear as to what is complete? The instant you are attached we can break (and I was before) but this was before the modules and threads were loaded which was mostly useless.

OnComplete could be after all modules load, which is what I seem to find MessageThreadCreated is. I think any event before MessageThreadCreated would be useless as you wouldn't have any active threads (so could not step).

Technically I would think OnAttachComplete would be after all threads load. I don't know the internals nearly enough but while dnSpy is loading the other threads are any threads able to run at all? If not then breaking after that would be great. I thought I tested this but its very possible I didn't test correctly.

Double Break issue This should only happen when the code contains an explicit call to a Debugger.Break() or similar and the break on attach breaks at that specific IL offset right?

Correct. Although a Debugger.Launch(); call automatically adds a Debugger.Break(). With visual studio hitting resume continues rather than immediately breaking a second time.

This caused me to notice another issue though, if the JIT is launched due to an exception being thrown rather than Debugger.Launch() even with all exception breaks enabled dnSpy will not stop for the exception.

It is interesting as if you use --break to break on attach there is no exception in the output window, but once you resume you then get:

12:01:40.284 An unhandled exception of type 'System.Exception' occurred in jittest.dll
12:01:40.290 Additional information: I am just fine thanks

I would have expected it to break but it does not. I am guessing this is because the application already threw the exception. This may be custom JIT launch exception handling (maybe why that ptr to the info is there:)) I would assume the best behavior is to check if the thrown/active exception is enabled on the Exception Settings list and follow the normal behavior in terms of breaking. That is likely beyond my current working knowledge of dnSpy internals.

Here is the tester app I am using.

public static int msSoFar = 0;
static void Main(string[] args) {
    var useException = args.Length > 0 && args[0] == "-e";
    Console.WriteLine($"Hello World hit a key, for I am PID: {Process.GetCurrentProcess().Id}!");
    Console.ReadKey();
    Console.WriteLine("Launching debugger");
    if (useException)
        throw new Exception("I am just fine thanks");
    else
        Debugger.Launch();
    var startedWhen = Environment.TickCount;
    for (var x = 0; x < 20; x++) {
        Thread.Sleep(1);
        msSoFar = Environment.TickCount - startedWhen;
    }
    Console.WriteLine("DEBUGGER RESUMING!");
    Console.ReadKey();
    Console.WriteLine("WILL BE DONE!");
}
ElektroKill commented 8 months ago

In situations where the debugger can't really attach until the event there are 0 message events that fire (including MessageProcessCreated or MessageRuntimeCreated) prior to handle wait.

I don't think we are in a race condition with the delay before event send (assuming the delay is large enough). I have never seen it miss a single instruction in this setup. I assume the debugger tries to attach as far as it can, and then as long as our delay is longer than that you are not missing anything. Essentially even if you waited 5 minutes in that delay you are going to still be fine.

Still, even if not a precision issue, the delay is a bit hacky. Sending the event at the correct time instead (sounds like before Start) does sound butter. Downside is you are now burying the code to send the event (or communicate back to the original process for multi-instance) deeper. maybe the best would be another event handler? RightBeforeDebuggingStarted hahaha.

Adding another message to the Message handler should not be a problem. We would just need to come up with a nicer name for it.

Sure although on attach complete may not be clear as to what is complete? The instant you are attached we can break (and I was before) but this was before the modules and threads were loaded which was mostly useless.

OnComplete could be after all modules load, which is what I seem to find MessageThreadCreated is. I think any event before MessageThreadCreated would be useless as you wouldn't have any active threads (so could not step).

Technically I would think OnAttachComplete would be after all threads load. I don't know the internals nearly enough but while dnSpy is loading the other threads are any threads able to run at all? If not then breaking after that would be great. I thought I tested this but its very possible I didn't test correctly.

The internal code of the debugger already has a OnAttachComplete event as mentioned, which gets fired after CorDebug attaches and dnSpy finishes processing the initial callback queue (which contains all process, module, and thread initialization events). It should be possible to expose this event and perhaps extend it with custom event args which would allow the receiver of the event to request a suspension of the process. (Most event args in debugger events have this capability). Such an vent would seem like a perfect candidate for the break-after-attach feature.

Correct. Although a Debugger.Launch(); call automatically adds a Debugger.Break(). With visual studio hitting resume continues rather than immediately breaking a second time.

For now I think it's fine if we just double break. When break-on-attach is all properly implemented we can look into writing a filter for these breaks to ensure only one.

This caused me to notice another issue though, if the JIT is launched due to an exception being thrown rather than Debugger.Launch() even with all exception breaks enabled dnSpy will not stop for the exception.

This is because when dnSpy is attached the the exception has already been thrown. This is why dnSpy does not break for the exception.

I would have expected it to break but it does not. I am guessing this is because the application already threw the exception.

Yeah the reason for the lack of exception break is most likely that the exception was raised before debuggig was initiated as tou mentioned. I'm not sure what we could do about this honestly. Forcing the handling of it somehow seems a bi forced and hacky. It really depends what information CorDebug is able to give us in this state.