dlang / visuald

Visual D - Visual Studio extension for the D programming language
http://rainers.github.io/visuald/visuald/StartPage.html
Boost Software License 1.0
288 stars 69 forks source link

Weird lock-ups all over the place, memory usage stable #280

Open TurkeyMan opened 1 month ago

TurkeyMan commented 1 month ago

Hey again, I'm having constant lockups when I put breakpoints at random places in my program. Memory usage is stable, so this is different from the other issue.

Visual Studio says Evaluating expression 'varName'... in a dialog box, and it sits there for about 3-4 minutes. Sometimes it does eventually complete, and the program cursor shows at the breakpoint and I can inspect values, other times after that amount of time passes VS makes a ding noise and the debug session terminates dropping me back to the editor, as if it reached some sort of abort condition.

This happens randomly when I place breakpoints are surprise places, but some change I made recently seems to cause it to happen virtually everywhere. VisualD is virtually unusable recently... I'm terrified that without Rainer at the station then VisualD is basically end-of-life? :/

I wonder what the future looks like?

TurkeyMan commented 1 month ago

I've narrowed down the scope of the issue; it's related to the "Call __debug[Overview|Expanded|Visualizer]" option in the debug settings... if I disable that option, it's fine. How can I diagnose more concretely what's gone wrong here?

rainers commented 1 month ago

Hi. The debug* methods do some advanced stuff, i.e. swapping the GC to avoid issues with the GC lock if some stopped thread is holding it. What's your "Switch GC" setting regarding the use of this function? The same can happen with C-malloced data due to the lock inside the system heap. Do your debug-functions use the GC or malloc, for example to return allocated strings?

Calling range methods for nicer display can be pretty slow, do you have that enabled?

You can attach another instance of VS to the locked up instance and try to figure out if it is actually waiting on a lock, from inside your code or in MagoNatCC.dll. If you could create a mini-core-dump with the task manager or process explorer, I can have a look, too, using the debug symbols, or I can also provide the PDB files.

TurkeyMan commented 1 month ago

So, I have a slight suspicion that it might be related to the debugger trying to call debug functions on yet-uninitialised objects... I have an Array container and a String container appearing in this scope, and I think it might be trying to populate the expand arrays with uninitialised length number of elements (huge values), so the expand for the arrays are enormous. And when the array elements themselves each involve their own debug function being called on millions of elements mostly pointing to uninitialised memory, I think it just kinda freaks out. It must be catching segfaults up the wazoo...

Sound likely? Or is there code to prevent this sort of behaviour somehow?

rainers commented 1 month ago

Sounds reasonable, though I could not reproduce it easily. The formatting functions (at least for the overview method) take care not to build a very long string, so should not iterate over arrays and generate a string longer than about 100 characters.

If an exception is detected during execution, the generation should stop immediately, not continue with the next array element. It then falls back to displaying the struct/class as if no debug__ function is defined.

The expanded view limits the number of elements shown to chunks of the number given by the respective option, 1000 by default. IIRC these are evaluated only as they become visible, but might each cause an exception.

Could it be that one of the __debug-functions themselves iterates over a badly uninitialized struct?

TurkeyMan commented 1 month ago

I've been trying to find patterns... but it's really tricky. The situation seems to change as I change things. The problem seems to come and go somewhat randomly. Possibly, when I make some changes to do some test, it may change the memory layout a little, and maybe that changes the situation. It seems kind of random whether some __debug function is going to cause a lock up for a long time, or just work fine.

I've isolated a case where a function is showing long lockups, I started commenting out __debug functions throughout my program until there was only one left... and then when I comment that one out, debugging returns to normal. Uncomment; lockup returns... okay, so it seems I've isolated the one that's causing a lockup:

struct String
{
nothrow @nogc:

    char* ptr;

    const(char)[] __debugOverview() const pure
    {
        if (!ptr)
            return ptr[0..0];
        ushort len = ptr[-1];
        if (len < 128)
            return ptr[0 .. len];
        return ptr[0 .. ((len ^ 0x80) << 7) | (ptr[-2] ^ 0x80)];
    }
}

So, this __debug function causes it to lock up populating just one single string in scope as far as I can tell for about 1-2 minutes. After that time, it does complete successfully, and it's also interesting to note that at this breakpoint, the string IS valid... so after 2 minutes, the debugger shows the proper string.

I tried changing that to this:

struct String
{
nothrow @nogc:

    char* ptr;

    const(char)[] __debugOverview() const pure
    {
        return "test";
    }
}

This still locks up for 1-2 minutes, after which time the string does indeed show "test" in the debugger.

If I comment that function out completely, debugging returns to normal.

So... the function is not being called on a bad context as far as I can see, it's just the presence of that function causing the program to hang in this instance.

I have toggled the "Switch GC" Mago option, but that doesn't seem to affect the problem. This __debug function doesn't allocate anyway.

I have no further theories from there... any thoughts? Anything you can think I should test?

TurkeyMan commented 1 month ago

Yeah, I think it's pretty conclusive; this is the only __debug function in my entire program, I change it to return 0;, and it still locks up for a long time. So it seems that simply the existence of the function is causing the lockups, and so I guess there's some sort of fundamental bug lingering in the system.

Also worth noting, I can't repro this is any context where this __debug would be called; only this very particular context, and if I change my program, the fault may randomly disappear. Instances of debug lockups have been appearing somewhat randomly, and also seem to move around my program; coming and going as I work. They never seem to make sense, there's never any obvious reason why the problem emerges here and not somewhere else.

rainers commented 1 month ago

Interesting, thanks for narrowing it down. Can you create a mini core dump of devenv.exe while the debugger hangs, e.g. using the task manager. This should shed some light on where it's locked up. BTW: does it need 100% of a core, or is it rather idling?