dlang / visuald

Visual D - Visual Studio extension for the D programming language
http://rainers.github.io/visuald/visuald/StartPage.html
Boost Software License 1.0
288 stars 70 forks source link

__debug function performance #282

Open TurkeyMan opened 4 days ago

TurkeyMan commented 4 days ago

Are there any opportunities to optimise or improve the performance of these __debug functions? They are extremely slow. I have a custom string type and if a struct contains a few strings, stepping feels really sluggish, and if there's an array anywhere in view; it takes seconds to 10s of seconds each step.

Why are they so slow? It doesn't seem right that setting up the call should be that much trouble? If I can understand the performance characteristics, maybe I can make improvements on the app side...?

I have NOT enabled the "switch GC" option, since I am confident all my __debug functions are @nogc.

rainers commented 4 days ago

I shortly looked at the CPU diagnostics of displaying 100 of your Strings and the debugger mostly waits for the call to be performed in the other process. AFAICT this needs multiple inter-process-communication (devenv.exe <-> msvcmon.exe <-> debuggee.exe) so I suspect there is some inefficient waiting going on. Maybe that can optimized for arrays of objects with debug methods, but that will get messy e.g. if it is only one field of the array element that has an debugOverview method.

TurkeyMan commented 4 days ago

Hmmm, well, in its current state it's almost unusable. I think I'm going to have to turn the __debug feature off, because it's just too slow :/ Sadly, I don't think it's really reasonable to work without it either; not being able to inspect strings and arrays severely undermines the usefulness of a debug session.

If it's about ipc waiting, then finding any opportunities for batching up requests seems like the way to go... is the code structured in such a way that you can gather the requests rather than resolving them immediately, and then send them all to the debuggee in one batch? That might require a lib compiled into the debuggee offering a function which can receive a bundle of requests from the debugger, and then the debuggee might iterate the bundle and resolve them all in one big go? :/ So, rather than calling each debug function independently, call one global debug function from a lib, which would locally iterate the bundle of requests calling each debug function on each object in the bundle? If the lib is not linked, fall back on existing semantics. Should be simple to check if a global debugResolveBundle symbol is present in the binary. Might help the GC swapping too; just do it once around the bundle...

TurkeyMan commented 4 days ago

Maybe we could put a function in druntime that's present when building with symbols... at very least, I wouldn't be upset to link a lib.

rainers commented 3 days ago

The swapped GC is inside a DLL loaded into the target process, so that could contain helper functions. The calls by VS to evaluate locals or expressions are rather fine grained and probably not easily combinable (especially not delayed until more requests come in), but if an enumerator is returned to VS, you can predict that it is pretty likely that it will get enumerated to some extend, so the elements could be evaluated in larger chunks and be cached. Not a small change, though...

TurkeyMan commented 2 days ago

The swapped GC is inside a DLL loaded into the target process, so that could contain helper functions. The calls by VS to evaluate locals or expressions are rather fine grained and probably not easily combinable (especially not delayed until more requests come in),

Yeah, this was my concern. I can a Microsoft API being very event/response based... with an expectation of immediate responses :/

but if an enumerator is returned to VS, you can predict that it is pretty likely that it will get enumerated to some extend, so the elements could be evaluated in larger chunks and be cached. Not a small change, though...

I'm not sure I follow, but I'll take your word for it. Can the debugger accept and retain an enumerator object across requests, and how would that have an IPC advantage?

rainers commented 2 days ago

Yeah, this was my concern. I can a Microsoft API being very event/response based... with an expectation of immediate responses :/

On second look, the API does allow to return results asynchronously, so it could allow bundling multiple requests to the target process.

Can the debugger accept and retain an enumerator object across requests, and how would that have an IPC advantage?

For structs and arrays an enumerator is returned when asked for "children", and VS then calls something similar to GetItems(enumerator, start, count). Even if called individually for each child, these could return cached results filled from reading larger chunks.

On the other hand, calling a function in the process is set up via an abstract stack language that gets compiled into the target process. This currently happens for each evaluation. Maybe it is the compilation that takes most of the time (not the execution) and can be avoided by reusing the same compiled instruction sequence with only the this-pointer exchanged.

TurkeyMan commented 2 days ago

Okay, it sounds like there's multiple promising paths forwards.