dotnet / fsharp

The F# compiler, F# core library, F# language service, and F# tooling integration for Visual Studio
https://dotnet.microsoft.com/languages/fsharp
MIT License
3.92k stars 786 forks source link

SRTP with large overload sets and errors leads to very high compilation load #9201

Closed abelbraaksma closed 3 years ago

abelbraaksma commented 4 years ago

EDIT (22 May): A repro that exhibits the likely cause of this bug (albeit not exactly the described sluggish behavior), was finally found by me after a week or so and 70+ helpful comments and traces. Instead of reading all, new visitors should probably just go to https://github.com/dotnet/fsharp/issues/9201#issuecomment-632894904. Please try the repro, I'm very curious whether it's reproducible.


A project that takes roughly 1 min to compile, after loading it in VS 2019, I first compiled it and then decided to measure how long it took for coloring, tooltips etc appear:

GIF animation of first 30-something minutes, click to start from the beginning, and grab some crisps and 3 beers before you start ;).

background-processing slow

Repro steps

Reproducible with my project and possibly with other large projects, I'd be interested if others have seen similar behavior.

At 34 minutes, it showed 22 tasks in background queue At 53 minutes, it showed 19 tasks in background queue At 56 minutes, it showed 0 tasks (finished) and mouseovers were responsive again (until I edit anything, one letter took another 5+ minutes with 6 tasks in queue ...).

Expected behavior

While I can understand that building up the IDE can take a bit longer than a simple rebuild-all, I don't expect it to take the same time I'd normally tend to fall asleep in a Vin Diesel movie. Making the screen-capture was just about similarly entertaining ;).

Actual behavior

As described above. Slooooow. For fair comparison, I opened only one VS IDE, and disabled most, but not all, 3rd party extensions.

In the Task Manager I can see devenv.exe at 4% CPU, which is effectively one CPU core on my machine. Memory for devenv.exe in those 40+ minutes went from 1.4GB to 1.6GB. Traditionally, I only notice extreme sluggishness after it reaches 2.8GB or higher.

Known workarounds

Strangely, I don't always see this behavior, though it has been like this for a few weeks now. I'm now going to restart, remove any auxiliary files, clean everything and start afresh, see if it still happens. As it is now, it is absolutely unworkable.

Related information

I'd be very interested in getting to the bottom of this, if you need any traces (I assume you do) or anything else, I'm more than willing to provide those.

abelbraaksma commented 4 years ago

Few hours later, I have submitted a trace (took long, because, well, typing one character takes 20 minutes at least...). I made the trace after: restart computer, removed *.user files, clean solution. I then waited for all background processes to stop (over 30 minutes this time around) and started the trace.

The trace collected data for another 35 minutes or so (the time it took to type one character and wait for the again-started background processes to stop and VS to become responsive again). I added comments in the trace for actions I did.

The trace is attached to https://developercommunity.visualstudio.com/content/problem/1034777/typing-in-an-f-solution-takes-minutes-to-get-type.html and most likely only viewable to MS employees (@KevinRansom, @dsyme, @cartermp, I assume you are able to access this?).

All fun and pun aside, as it is now I cannot develop anything seriously anymore... :S (though, gotta admit, if you type one char per 30 minutes, your code quality greatly improves! As does my coffee addiction).

My next steps: disabling everything in VS that could hog the CPU, installing latest previews to see if that improves things etc etc (I severely hope it is some extension I overlooked). Somehow I need to be able to get back to work ;).

cartermp commented 4 years ago

@abelbraaksma I'm looking at the trace, and it's ... weird. I'm not sure if I can say anything at his point, but in the trace, literally nothing is showing as being allocated until 24 minutes into the trace. After that, there's data about allocations. When looking at CPU stacks, it's not until 29 minutes into the trace that anything is registered.

This is pending some more investigation, but the CPU and memory usage items actually look fairly normal if I slice the sample to a period where there's actually data. Though it would be indicate of the compiler simply having to process a lot of F# code, since top allocations are from strings and tuples used deep in the guts of the compiler. CPU stacks show a similar story.

What's most telling about something being off here is that 1.4GB is nowhere near enough memory usage for devenv.exe to go haywire due to exceeding memory limitations, and 4% CPU is also not a whole lot of work going on. It's almost as if there's some enormous Thread.Sleep() getting called somewhere.

abelbraaksma commented 4 years ago

Ok, bunch of tests further and a bit wiser (and a bit more confused). I did the following things:

I repeated this a few times to be sure. I know that in the past this setting was set to 200 and enabled and that it caused a lot of performance problems. Several issues on github were reported on it and @cartermp (iirc) showed me back then to lower the number to 3 or 4. It has since been 3 and until recently this didn't give me real problems.

I thought that if I disabled this option, that tooltips wouldn't work on types that are not defined in the current project, but in a project-to-project reference (at least that is how I remembered it: disabling completely was a bad idea in the past). But I seem to have been wrong, or behavior has changed. I couldn't (yet) find any behavior difference with caching on or off, except for extreme (200-500x) performance degradation. And let's be fair, "caching" suggests speed-up, not speed-down.

If you read my reports thus far, I'll summarize the remaining questions:

  1. The extreme performance degradation mentioned in the original report (and with the trace), what could cause this and how can we prevent it? One additional observation: the solution had an added project which lies in-between in the references-chain. If anything, that's the only "big" change from recent weeks, perhaps this confused this feature?
  2. How is this feature supposed to work and shouldn't it improve performance (until memory gets in the way, but here we saw memory between 1.2Gb and 1.6Gb, nothing extreme)? And if its only effect is to downgrade performance 400x, what feature does it enable or improve that warrants such perf timings?
  3. Conversely: what will stop working if this feature is disabled? (so far it's the opposite, I have my supercool F# experience back and it's faster than ever!!!!)

Until it happens again, I won't be able to get the 40-minutes start-up performance and 20-minutes per-edit performance back, as my actions seem to have changed it, hopefully forever. Hopefully the report, however, can help analyze what happened here and prevent it from happening to others.

abelbraaksma commented 4 years ago

and 4% CPU is also not a whole lot of work going on

@cartermp, thanks for looking at this! You may read/interpret this wrong (if you got it from the trace), because I could see clearly in Process Explorer that a single thread was running 100% on a single CPU Core. Since I have 24 cores over 2 physical CPUs, that comes down to 4% (since part are virtual cores, sometimes it is reported as 8%).

Bottom line: of the list of threads, there was one that was doing all the work, the rest was sitting idly by (I would assume the busy thread was not a background thread, because these should be independent of the rest, but I couldn't verify, they were all _CorExeMain).

I'm looking at the trace, and it's ... weird. I'm not sure if I can say anything at his point, but in the trace, literally nothing is showing as being allocated until 24 minutes into the trace.

Somehow this doesn't surprise me, as I was also looking live at the CLR traces (again: Process Explorer) and the allocations were very, very low for such a big application and all that CPU.

My gut told me there's a wait-loop somewhere and the thing it is waiting on gets released, but only very slowly. And this happens often. This could mean a (semi-) race-condition, a file on disk that is shared and not released properly, a memory mapped file, mutex etc etc.

Though I don't think it is caused by something in memory (well, not from the start). The behavior survived restarts of the computer, updates of Windows and basically over a week of doing all sorts of stuff. Only until I disabled (not removed) all extensions, the behavior changed (to 2-4 minutes). This behavior is reproducible with my current code and settings. The previous behavior isn't.

I've read online that there were some issues with Live Share. I am not using it, not even logged in, but the extension was enabled. Could the extreme delay been caused by waiting on network services or even a log-in that never happened and has to time out several times?

KevinRansom commented 4 years ago

@abelbraaksma , can you share some details of the project/solution that you have loaded. Does it have a lot of projects or is it a single project? Is it netsdk or dotnet framework project? Is it on github somewhere that we can load?

Thanks

Kevin

abelbraaksma commented 4 years ago

@KevinRansom, I can certainly share the project in confidence, or we can set up a remote call with shared screens if that helps (it is mostly closed source, but they won't be too fuzzy sharing it under conditions). The solution I tested this with and used for the trace has 11 projects. Since I had issues with performance in F# long before as well, I have split the "big solution" into a bunch of smaller ones, depending on what part I am working on. I rarely work on my all.sln, because it's too slow, generally anyway.

The main solution has 34 projects (not all of which are very significant).

I've estimated the LOC at some 150k lines, but depending on how you count you may find a different number.

All projects are (unf.) still legacy projects. There was something in the new project format that blocked me from changing (we tried), atm. I can't remember what that was. However, there aren't any really fancy things going on: each project is .NET Framework 4.6.1 (we build libraries and cannot update to latest all the time), each compiles to a library, three compile to an EXE. There is a handful of C# projects, but these are comparatively small.

There are two "heavy" projects: lots of source lines, quite a bit of SRTP-style inline (but again, nothing out of the ordinary, I think) and compiled binary size of all builds up to ~13MB Release Build (only counting DLLs, not counting test projects). Those 2 "heavy" projects are 5.5MB and 2.7MB resp. after compilation.

A clean build takes around 2-3 min (dep. on whether or not tests are included) on my system.

I think this all counts as "big" but not "huge" or "extreme". Another C# project I manage has 110 projects in the main solution, with 10 min clean build time.

cartermp commented 4 years ago

Well looky here:

image

26% of CPU time in the sample is spent checking if a file exists. Ooof.

I'm still not quite confidence in the trace data I'm looking at since there's such a massive gap of time before there's anything going on here. However, the data I'm looking at doesn't look abormal. So I'm not terribly surprised with what I'm seeing, especially if your codebase makes use of a lot of assembly references @abelbraaksma

KevinRansom commented 4 years ago

Does it use typeproviders? And I would really appreciate if you could remember what blocked the conversion, because we are no longer shipping legacy project format templates. They will still load and edit, but we don't really want new projects created using that format if we can avoid it.

abelbraaksma commented 4 years ago

but we don't really want new projects created using that format if we can avoid it.

I know. But this project has been in development for > 7 years, and has some MSBuild dependencies baked in. I'm sure it can all be converted eventually and we really should. But we spent a few days trying and gave up. It is not a trivial conversion (and so much even MS admits on the pages dedicated to moving over to the new format).

But there's a big upside with conversion that we certainly recognize: multi-targeting. Currently, that's a mess and we really want to target .NET Framework and .NET Core (reason: customers ask for it), which currently isn't easy to do.

Does it use typeproviders?

Nope, none. There are some quotations though (test prj with UnQuote).

KevinRansom commented 4 years ago

So ... perfectly straightforward medium sized solution ide goes super slow .... eek. Did you ever see this type of behaviour with our oss VisualFSharp solution ?

cartermp commented 4 years ago

@abelbraaksma Regarding in-memory cross-project references: when the number of projects held in the cache is only 3 (which is small) and there are a lot of project references, this can actually lead to more memory and CPU consumption over time since the IDE is constantly recomputing things u necessarily. I'd be curious about your experience if you turned it back on and then set the value to something high like 200.

abelbraaksma commented 4 years ago

26% of CPU time in the sample is spent checking if a file exists. Ooof.

@cartermp If so much time is spent there, and there's probably IO involved which doesn't show in the CPU trace, that can surmount together as "was virtually doing nothing" for a long time, right?

This suggests that it was looking for a file that wasn't there, was corrupted or otherwise time-consuming to retrieve. Since the whole performance changed dramatically after I changed the extensions (disable, re-enable), I doubt it is the extensions themselves that cause the strange performance, but what if (thinking out of the box) the MS extensions share a file with (binary) settings that in turn cause this file-checking to go so slow?

Btw, the full ist of assemblies include a bunch of System.XXX, FParsec, Netwonsoft, NodaTime, FSharp.Core. The majority being System, which is a lot of files, but only 3.3MB.

abelbraaksma commented 4 years ago

PS: since we're all online debating this, would it perhaps be more efficient to set up a call?

However, the data I'm looking at doesn't look abormal. So I'm not terribly surprised with what I'm seeing, especially if your codebase makes use of a lot of assembly references

@cartermp I don't understand this, because:

  1. Editor used to show near-direct responsiveness when typing (this was different in 2015, but your efforts have really paid off and I am generally very, very happy with the improved performance). Some exceptions could be observed if editing the most-common reference, and then swapping to the least-common (i.e., the final exe). Other than that, performance has been really very good recently.
  2. The whole project is not absurdly large (I think), and doesn't do any crazy stuff (afaik)
  3. The total number of references is not that much for the project I profiled. Here's the list, the ones starting with Exselt are project-to-project references: image
  4. Nothing is anywhere near "normal", I think, when you needed to wait 10-20 minutes after each keystroke... Right? I'm not sure why the trace doesn't show it, do you have the full 35-or-so minutes?

Which begs the question, how can it spent so much CPU time in that method if there aren't that many files to load? Or: these files it tries to load have nothing to do with F# or my project? Some inner process going berserk? ;)

KevinRansom commented 4 years ago

@abelbraaksma phillip was referring to the trace .. your experience is actually scary.

abelbraaksma commented 4 years ago

ah, got it! :+1:

Yep, it was scary indeed ;). So scary it got funny, and then scary again :P

abelbraaksma commented 4 years ago

@cartermp, how can I access my own trace? I'm quite good at performance analytics, maybe I spot something.

If you guys wanna call or chat, my Skype handle is the same as my name here, but a dot between first and last name. Or any other app, or actual phone, is also fine.

abelbraaksma commented 4 years ago

Regarding in-memory cross-project references: when the number of projects held in the cache is only 3 (which is small) and there are a lot of project references, this can actually lead to more memory and CPU consumption over time since the IDE is constantly recomputing things u necessarily

Hmm, but that suggests that I wouldn't be able to see mouse-overs of types, modules, functions from cross-project refs? I verified this, and I can see everything, even if I haven't done a build (if the option is switched off). When on, well, tooltips appear after minutes at start-up or after an edit.

I'd be curious about your experience if you turned it back on and then set the value to something high like 200.

I'll try this

abelbraaksma commented 4 years ago

So ... perfectly straightforward medium sized solution ide goes super slow .... eek.

@KevinRansom Rectification on LOC. The F# part of the traced solution had ~126,500 non-empty lines in *.fs files. The full solution (not traced) has ~245,000 LOC. There's about 35k LOC in the C# parts.

But still within the bounds of "medium sized" or what I like to call "rather big" solution ;).

Did you ever see this type of behaviour with our oss VisualFSharp solution

Not so extreme, no. Though that solution generally behaves a bit slower than my own in regular code-editing stuff. But since 90% of my time is spent in my own project and I've only occasionally done editing in the OSS VisualFSharp solution, it is not a fair comparison.

I've seen extreme degradation before a few times (though not as extreme as these days). It usually happened when there were a lot of errors, i.e., there's a type I basically use everywhere. Change that type, and the there'll be thousands of errors. Try then to edit something at the far end in the solution, this is dead slow. But that is not typical, and since I split the solution in smaller sub-solutions, such large refactorings are now done in phases.

In this case, there are 10 errors spread over two or three files. But even without the errors, it was dead slow.

KevinRansom commented 4 years ago

@abelbraaksma I cleaned, built and loaded the OSS solution in vs, rebuilt. And still I am not seeing classification and tooltips. classification is the phase that shows the cyan TypeNames.

CPU is doing little to no work, Working set is 766,000 K so ... really not stressed.

cartermp commented 4 years ago

@KevinRansom this call is the one where all of the CPU time is being spent checking if a file exists: https://github.com/dotnet/fsharp/blob/master/src/fsharp/CompileOps.fs#L3013

The IsSystemAssembly assembly property is what I suspect the most, but there's no data in the trace indicating that it's this one specifically. However, this is where the actual call to File.Exists ultimately happens.

Is it necessary to have that exists call in there, given that we're already wrapping this in a try..with block?

Let's not discuss the F# OSS solution in this thread. The issues seen here don't surface in our codebase after the numerous times we've profiled this one.

KevinRansom commented 4 years ago

@cartermp throwing an exception is hugely more expensive than checking a file exists. So this call would be way more expensive if we relied on the exception. The exception exists for cases where the reason the file is inaccessible has nothing to do with its presence. For example permissions to access the file is not granted.

cartermp commented 4 years ago

Right, but was that ever measured? That is, have we measured how expensive it is to do these things when a file doesn't exist is relative to the time spent checking if it exists (an expensive operation on Windows)?

abelbraaksma commented 4 years ago

I'd be curious about your experience if you turned it back on and then set the value to something high like 200.

@cartermp: you may be on to something (CPU time, not wall-clock time):

Note: with the setting disabled, initial startup is 7 sec (after first screen appears), and 2 sec before mouseover work, almost no delay whatsoever between edits. So I'm still not getting the "caching" bit and why the mouseover should wait for the background task to finish if the info is already there (the "stale" setting is enabled).

I think something operates in the wrong order here.

abelbraaksma commented 4 years ago

@cartermp Can I access my own trace somehow? I'd like to see what you see

The IsSystemAssembly assembly property is what I suspect the most, but there's no data in the trace indicating that it's this one specifically. However, this is where the actual call to File.Exists ultimately happens.

Perhaps it makes sense to create a "clean" trace, because the extreme situation has not re-appeared and I suspect the cause is really in some kind of deadlock that may not readily show in the CPU-bound or memory-bound traces.

A new trace would reveal perhaps why, with "caching" enabled, it is still so slow, or, even if we accept that it can be slow, what is causing it and warn users against using such settings-combinations.

abelbraaksma commented 4 years ago

I'd also be interested in why these background processes appear to mostly act synchronously, as I keep seeing one thread doing all the work.

Also: even when doing nothing (like when typing this), not having focus on VS IDE and background tasks finished, devenv.exe still spins at 100% for a single thread (1 core at 100%, 8% overall).

abelbraaksma commented 4 years ago

Ok, my last comment may point to what @cartermp was already hinting at: (almost) nothing is happening. Meaning, if, when doing nothing, the CPU slice it takes is 9%, and when running background tasks it is also 9%, the difference is so small that my money is on some rogue thread that blocks the rest from running efficiently.

Is there a way I can detect this thread is the UI thread or something else? Notice the cycles delta for 16108, this one is consistently on top (when I do nothing in the IDE). Again, 8% here means 100% on a single CPU core.

image

abelbraaksma commented 4 years ago

Strike that, sorry for the noise. It makes no sense. I just tried the theory, but the hogging thread above, 16108, disappears completely (ends) when I edit something, ergo, it cannot be the UI thread. While editing, it shows a more logical slicing. Though it is beyond me why the numbers don't spread over the different cores and together add up to something closing to 100% when it's busy. There seems to be too much waiting-between-threads going on:

image

cartermp commented 4 years ago

The massive wait times with nothing going on smells like race condition or deadlock or something going on in the VS process. F# does use multiple threads, but only for operations that rely strictly on parse data. It could be that we're messing with something else here, but I've never observed this with daily usage of VS in lots of F# codebases...VS threading, on the other hand, is immensely complicated and there's a lot of stuff happening outside of F# all the time in the VS process. So my intuition is that there's a bug somewhere else, but I couldn't say way.

Regarding this:

Note: with the setting disabled, initial startup is 7 sec (after first screen appears), and 2 sec before mouseover work, almost no delay whatsoever between edits. So I'm still not getting the "caching" bit and why the mouseover should wait for the background task to finish if the info is already there (the "stale" setting is enabled).

I think something operates in the wrong order here.

Tooltips rely on typechecked information, and due to the design of the F# compiler service and the nature of type inference, requests for that data are serialized and placed in a queue. This could be made a lot better in theory, but it'd be a huge amount of work.

abelbraaksma commented 4 years ago

requests for that data are serialized and placed in a queue

@cartermp That sounds like how I remember it, but there are two things that don't add up, or I don't understand them:

  1. If cross-project setting is on, regardless of the caching setting, there's a noticeable and measurable delay in the tooltips, and thus the type inference to complete.

  2. If cross-project setting is off, it is near-instantaneous.

Here's how I think it works:

And that's the thing: the behavior described here happens when I edit something in one file, in the top-project. It still appears to rebuild everything (the 3 min delay), as opposed to using stale data (I haven't seen this setting working, ever), or simply using the cached file (after all, there was no change in referenced projects).

If you need to rebuild all each time anyway, the caching-option ~makes little sense to me~ edit: wait, it means it keeps the fs files and other stuff in memory, may be I got it now ;). That's why it is faster when I set it to 200.

Also somewhat surprising: clean + rebuild all takes less time than waiting for tooltips to appear. I'd expect the other way around, since optimizations don't have to kick in.

So my intuition is that there's a bug somewhere else, but I couldn't say way.

yeah, my intuition was leaning that way too, alas... ;)

KevinRansom commented 4 years ago

@cartermp

Here is some code, it grabs a bunch of files, and then for each of them checks if it exists. Unfortunately that is so fast I loop that a bunch of times so that if I blink I can tell it's still working.

I have two different commented out lines of code.

  1. Does a file exists and updates a counter.
  2. raises an exception, the counter is updated in the try block
    
    open System
    open System.IO
    open System.Timers

[] let main argv = let directory = new System.IO.DirectoryInfo(@"C:\kevinransom\fsharp\") let files = directory.GetFiles("*.fs", SearchOption.AllDirectories)

let mutable count = 0

let st = DateTime.UtcNow
for loop in 1 .. 100 do
    for file in files do
        try
            if file.Exists then count <- count + 1
            //raise (new Exception("Some text"))      //if file.Exists then count <- count + 1
        with | _ -> count <- count + 1

let en = DateTime.UtcNow
let duration = en.Subtract(st)
printfn "%A - %A = %A Checked: %A" st en duration count

0 // return an integer exit code

So, this first result is for an outer loop of 100 loops of exceptions:
5/16/2020 12:50:01 AM - 5/16/2020 12:50:06 AM = 00:00:05.5101554 Checked: 454100
So thats pretty great 5.5 seconds for half a million exceptions ... groovy I'm excited.

This second result is for FindFile:
5/16/2020 12:51:27 AM - 5/16/2020 12:51:27 AM = 00:00:00.0044857 Checked: 454100
Oh bugger that's way to small a number to be snarky about.
Thats about a 1/2 a hundredth of a second half a million checks

Lets increase the outer loop to 175000
5/16/2020 12:54:07 AM - 5/16/2020 12:54:13 AM = 00:00:05.3581701 Checked: 794675000
So we get 794675000 File.Exists in about the same time as 454100 exceptions
cartermp commented 4 years ago

@KevinRansom unfortunately this isn't quite what I'm asking. I'm not curious if we tested that File.Exists is faster than an exception. It always is (unless it also throws an exception). In the code I linked (and lots of other places File.Exists is called), it would seem really, really odd to me if the file being checked didn't exist. In other words, consider the following:

Which is higher in cases like what I linked to? Has this been tested?

abelbraaksma commented 4 years ago

So we get 794675000 File.Exists in about the same time as 454100 exceptions

This also suggests that this is not likely to be the cause, because I doubt my project, or any project, has 794,675,000 assemblies, and 5sec is not much on 40 min. Unless of course an outer loop is accidentally nested that leads to exponential performance of the mentioned code segment in IsSystemAssembly.

I also noted that FileSystem.SafeExists actually seems to use a cache underneath, which suggests that multiple calls shouldn't be too CPU or I/O intensive (though the backing dictionary is not a concurrent dictionary...). I didn't check carefully, but it seemed to refer to this definition:

        member __.SafeExists(fileName) = 
            files.ContainsKey(fileName) || defaultFileSystem.SafeExists(fileName)
abelbraaksma commented 4 years ago

I'm about to hit the hay (3.30AM here). Many thanks for your support and looking into this so far, it's really appreciated! I understand it is possible we won't be able to get to the bottom of this. Though it might help if I made a new trace of the current situation, which might shed some light on where time is being spent in the now faster-but-still-slower-than-expected situation.

It is not unlikely the situation will reoccur at some point, in which case I could try to make a different kind of trace that better shows any stalls (if that is possible).

KevinRansom commented 4 years ago

My theory about that big number in the trace, is that because IO was going on that is where the GC may have got control and memory collected. Except I half recall that gc is supposed to not suspend threads any more. So I suppose my theory is wrong too.

In a decently working dotnet sdk project I would expect IsSytemAssembly to only be called a few hundred times, and in a net framework project less than half that. But again I could be wrong.

What is bothering me, is that I am not seeing classification for VisualFSharp.sln on my machine and I feel like that must be due to my environment because someone should have complained about by now.

cartermp commented 4 years ago

@abelbraaksma A new trace with:

When memory/CPU usage is fairly high and things seem slow (or even slow-ish) could be interesting, it might yield some data pointing at other things that are not working as well as they could be.

cartermp commented 4 years ago

If the "30 minutes until I get IntelliSense" problem shows up again we'll want to loop in some folks from the VS side to investigate further. That's an utterly absurd amount of time to get any feedback from the system and we'll want to get to the bottom of that.

abelbraaksma commented 4 years ago

is that I am not seeing classification for VisualFSharp.sln on my machine

Oh, you mean you don't see it all anymore? Odd... (or you hit the same issue I had: classification came in after a loooong time, just wait 1hr or so for it and meanwhile work in a different IDE instance).

abelbraaksma commented 4 years ago

If the "30 minutes until I get IntelliSense" problem shows up again we'll want to loop in some folks from the VS side to investigate further. That's an utterly absurd amount of time to get any feedback from the system and we'll want to get to the bottom of that.

Meaning: I shouldn't "fix" it right away such that the repro disappears, but wait until you guys can have a peek. Np.

abelbraaksma commented 4 years ago

A new trace with:

I'll prepare that tomorrow, if that's ok with you (I might otherwise fall asleep while waiting and staring at the mouseover not responding :P)

cartermp commented 4 years ago

@abelbraaksma If the issue shows up again when you're comfortable checking it out, we'll definitely want to investigate. But if you can live without your extensions for a while (since that seems to be the primary issue), then just working without them is ideal. Definitely don't want you to do work having to wait 30 minutes every time you type a character :)

Do you have a list of all your extensions you could share?

abelbraaksma commented 4 years ago

But if you can live without your extensions for a while (since that seems to be the primary issue),

@cartermp: you may have misunderstood, it was NOT the extensions, at least not by themselves. What happened was: I disabled all extensions, the behavior disappeared partially, then enabled them again and expected to find the rogue extension. But that never happened. The behavior remained the same.

abelbraaksma commented 4 years ago

Except I half recall that gc is supposed to not suspend threads any more. So I suppose my theory is wrong too.

I think you mean this behavior? If I read that correctly, it is only true for Gen2 collections.

Do you have a list of all your extensions you could share?

Coming up

KevinRansom commented 4 years ago

@cartermp in the case of probing the file mostly doesn't exist, it is being searched for. However, 794675000 file exists costing 5 seconds I doubt that many programs search that many programs would search for a file 794 million times. although I could be wrong

abelbraaksma commented 4 years ago

Do you have a list of all your extensions you could share?

I installed an extension to export the extensions, I hope a JSON file is OK (it can be imported into a VS IDE using this extension, if you need to test a scenario or something). There's no private data in the file.

extensions.zip

One extension isn't shown here, and that is NCrunch. That's a big one in terms of resource hogging, but it was disabled for all my tests and traces anyway. I only enable it when I work with the test projects.

Oh, and I still had a screenshot from extensions I installed the last time, just after I moved to a new instance of VS Community Edition:

image

abelbraaksma commented 4 years ago

@cartermp, I've added a new trace through this VS Community issue: https://developercommunity.visualstudio.com/content/problem/1035124/trace-of-editing-experience-with-in-memory-cross-p-1.html.

Noticeably:

Since this trace was done on the same solution with the same source and the same windows open, and I did by en large the same actions (typing a little, waiting for tooltips), it can be used to compare with the original trace.

My guess: this new trace shows what the Compiler Services are actually doing in terms of CPU-bound actions and the 20-40 min delay seen earlier, with a spinning devenv.exe, was held up by something, plus it had the mem-cache setting at 3 which caused many recalculations. You mentioned that you saw almost nothing in the trace. That makes sense, seeing that the new trace is much, much shorter (1 min compared to 35 min and more edit actions done in that 1 min than in the 35 min in the other trace).

Big question remaining: what could it have possibly been that held the process so long?

When memory/CPU usage is fairly high and things seem slow (or even slow-ish) could be interesting, it might yield some data pointing at other things that are not working as well as they could be.

It appears that devenv.exe with the present settings doesn't spin with a single thread at 100% CPU for that thread (as shown in earlier screenshots). The difference between that moment and this moment are the disabled extensions. Arguably, there's at least one extension that runs rampant (but not so much that it caused the 20-40 min delay, we still don't know that). I'll investigate this further and try to single out an extension.

abelbraaksma commented 4 years ago

@cartermp, @KevinRansom, the issue is back, it was not a fluke, and it's in a different file, different project, same solution. I'm recording again. While I type this from mobile, no coloring whatsoever has appeared, nor any tooltips, and I'm waiting for 15 minutes.

You wanted me to alert when it happened again, so that the VS team could have a look. Anything besides recording you'd like me to do?

I haven't yet tried the "magic fix" that worked before: disabling all extensions and reenabling them again. Cross prj caching is set to 200. Memory is at 1Gb. "Big" extensions are disabled.

abelbraaksma commented 4 years ago

still recording, ~30 min now~ ~60 min now~ ~1h20m now~ 2hrs now, 15 background tasks still running since starting recording. Still no coloring of any kind, seems hanging

I'll let it run for another 15 min, if nothing changes, I'll stop recording and send the results, then I'll attach a debugger, maybe that'll shine some light to this.

EDIT: @cartermp & @KevinRansom, trace is here: https://developercommunity.visualstudio.com/content/problem/1036876/hitting-perf-issue-again-new-trace-for-issue-9201.html

And this is for fun, SleepEx, of course!!!:

image

0x53A commented 4 years ago

After a short search it looks like taskbardockappintegration32.dll belongs to a product named bins: http://www.1upindustries.com/bins/

It seems to load itself into any running process, probably using hooks.

Could you try to disable / uninstall that?

abelbraaksma commented 4 years ago

@0x53A, that's a good one to try, never even occurred to me (used it for years without giving it another thought)! It doesn't, however, seem to relate to 100% CPU on one thread (Sleep doesn't cost CPU afaik, and it gives resources to other threads, if anything). I'll def. try switching it off though.

Meanwhile, most breaks while debugging (and just now when I tried to quit the process) seem related to FSharp. Any idea how to get symbols loaded of ni (native image) dll files?

image

abelbraaksma commented 4 years ago

Hmm, this time, restarting VS IDE helped, and after opening the same solution again, same files etc etc, just appeared responsive after some 30 sec. (sorry @cartermp, I couldn't keep it open much longer, until VS IDE team could have a look, I need to go for some groceries soon, we'll try that session next time it happens again ;).