dotnet / vscode-csharp

Official C# support for Visual Studio Code
MIT License
2.86k stars 671 forks source link

Slow step through and watch variable expansion when debugging in a Docker contianer #2432

Open janaka opened 6 years ago

janaka commented 6 years ago

There's a performance difference between native debugging and Docker container debugging. The problem can be observed when expanding variables in VS Code. In this test project https://github.com/jamiegs/dotnet-debugging-performance the difference between native and docker is small but still visible. On a real code base, it is much more significant. See this video https://youtu.be/V_jWtOHjvOg for a demonstration.

The details here are on macOS + VS Code. But we have observed the same on Window + VS Code and Windows + VS. Run on high spec Mac Book Pros and Dell XPS hardware. Changes to resources allocated to the container does not seem to make a difference.

Environment data

.NET Core SDK (reflecting any global.json):
 Version:   2.1.300
 Commit:    adab45bf0c

Runtime Environment:
 OS Name:     Mac OS X
 OS Version:  10.13
 OS Platform: Darwin
 RID:         osx.10.13-x64
 Base Path:   /usr/local/share/dotnet/sdk/2.1.300/

Host (useful for support):
  Version: 2.1.2
  Commit:  811c3ce6c0

.NET Core SDKs installed:
  1.0.1 [/usr/local/share/dotnet/sdk]
  1.0.4 [/usr/local/share/dotnet/sdk]
  1.1.4 [/usr/local/share/dotnet/sdk]
  2.0.0 [/usr/local/share/dotnet/sdk]
  2.0.3 [/usr/local/share/dotnet/sdk]
  2.1.4 [/usr/local/share/dotnet/sdk]
  2.1.300-preview2-008533 [/usr/local/share/dotnet/sdk]
  2.1.300 [/usr/local/share/dotnet/sdk]

.NET Core runtimes installed:
  Microsoft.AspNetCore.All 2.1.0-preview2-final [/usr/local/share/dotnet/shared/Microsoft.AspNetCore.All]
  Microsoft.AspNetCore.All 2.1.0 [/usr/local/share/dotnet/shared/Microsoft.AspNetCore.All]
  Microsoft.AspNetCore.App 2.1.0-preview2-final [/usr/local/share/dotnet/shared/Microsoft.AspNetCore.App]
  Microsoft.AspNetCore.App 2.1.0 [/usr/local/share/dotnet/shared/Microsoft.AspNetCore.App]
  Microsoft.NETCore.App 1.0.4 [/usr/local/share/dotnet/shared/Microsoft.NETCore.App]
  Microsoft.NETCore.App 1.0.5 [/usr/local/share/dotnet/shared/Microsoft.NETCore.App]
  Microsoft.NETCore.App 1.1.1 [/usr/local/share/dotnet/shared/Microsoft.NETCore.App]
  Microsoft.NETCore.App 1.1.2 [/usr/local/share/dotnet/shared/Microsoft.NETCore.App]
  Microsoft.NETCore.App 1.1.4 [/usr/local/share/dotnet/shared/Microsoft.NETCore.App]
  Microsoft.NETCore.App 1.1.9 [/usr/local/share/dotnet/shared/Microsoft.NETCore.App]
  Microsoft.NETCore.App 2.0.0 [/usr/local/share/dotnet/shared/Microsoft.NETCore.App]
  Microsoft.NETCore.App 2.0.3 [/usr/local/share/dotnet/shared/Microsoft.NETCore.App]
  Microsoft.NETCore.App 2.0.5 [/usr/local/share/dotnet/shared/Microsoft.NETCore.App]
  Microsoft.NETCore.App 2.1.0-preview2-26406-04 [/usr/local/share/dotnet/shared/Microsoft.NETCore.App]
  Microsoft.NETCore.App 2.1.0 [/usr/local/share/dotnet/shared/Microsoft.NETCore.App]
  Microsoft.NETCore.App 2.1.2 [/usr/local/share/dotnet/shared/Microsoft.NETCore.App]

To install additional .NET Core runtimes or SDKs:
  https://aka.ms/dotnet-download

VS Code version:

1.25.1

C# Extension version:

1.15.2

Steps to reproduce

Based on code in https://github.com/jamiegs/dotnet-debugging-performance

There are two Launch Methods, Docker and Native.

Expected behavior

Actual behavior

Demo video comparison using https://github.com/jamiegs/dotnet-debugging-performance https://youtu.be/lkDnFPGOumM

Demo video comparison using a larger, 20,000 line application. https://youtu.be/V_jWtOHjvOg

rchande commented 6 years ago

@gregg-miskelly This sounds like it's probably a debugger issue?

janaka commented 6 years ago

@jamiegs this happens with VS2017 also right?

jamiegs commented 6 years ago

@janaka it does anecdotally, I haven't taken any timings of the performance though.

chuckries commented 6 years ago

@janaka Are you using linux containers with docker on both windows and mac?

janaka commented 6 years ago

@chuckries yes

chuckries commented 6 years ago

I want to let you guys know that I'm looking into this, but I haven't had a tremendous amount of time yet. I am trying to instrument our code to get some timing data on either side of the pipe (docker.exe in this case). I also have installed Ubuntu side by side on my macbook so that I can try to get some mac/linux numbers on the same hardware.

My guess upfront is that vsdbg is just slower running on the virtualized linux kernel in docker on mac. I will respond to this when I have some actual numbers. Thanks for reporting this!

chuckries commented 6 years ago

Here are the results of my ad-hoc analysis. My conclusion: docker is not slow but virtualization is.

The Instrumentation

I instrumented the debugger to measure the elapsed time of response in two places:

  1. On the Adapter side: this is the portion of the debugger that always runs locally with VS Code. The measurement is taken from the point when a request is sent to the "pipe" to the point a response is received from the "pipe". For local debugging, the pipe exists completely within the debugger process. For docker debugging, the pipe is docker.exe. This measurement is always taken on the host machine.
  2. On the Engine side: this is the portion of the debugger that always runs where the application is. The measurement is taken from the point a request is received from the pipe to the point a response has been created and will be placed back on the pipe. For local debugging, this measurement is taken on the host machine. For docker debugging, this measurement is taken inside the container.

The Test

With this instrumentation in place, I gathered elapsed time measurements by expanding "this" after stopping in HomeController.cs in the repro provided. I took these measurements on both macOS High Sierra 10.13.6 and Ubuntu 16.04. Both of these OS's were run natively on the same MacBook Pro hardware (dual boot). All tests were done using .NET Core SDK 2.1.302 (coreclr 2.1.2). I used a private release build of the C# Extension Debugger that was instrumented to collect elapsed time measurements as described.

The Raw Data

The measurements below are in microseconds that it took to respond to the request to expand "this" when stopped in HomeController.cs.

Linux Mac
Local Docker Local Docker
Iteration Engine Adapter Engine Adapter Engine Adapter Engine Adapter
1 2938358 2939101 2349848 2351742 3863562 3863906 5347215 5352754
2 2867511 2867852 2273766 2274889 3590077 3590630 5344905 5344100
3 2872347 2872741 2262749 2263813 3552246 3552546 6141752 6140130
4 2896274 2897103 2249261 2250578 3594334 3594625 5470241 5468553
5 2871539 2871859 2302355 2303903 3691909 3692372 6568443 6565845
6 2906964 2907353 2334716 2335732 3556107 3556393 5920160 5917727
7 2903448 2904032 2326317 2327477 3670246 3670553 5969643 5971757
8 2925795 2926287 2324877 2325811 3569883 3570178 6143094 6141412
9 2893591 2893953 2482863 2484262 3542470 3542790 5732719 5750978
10 2895258 2895718 2253518 2255403 3548777 3549094 5965903 5963876

This chart shows the average of the 10 iterations. image

Conclusions

Docker is not inherently slower. My measurements show that docker on linux outperforms local linux debugging in the given setup. The Adapter measurements for both scenarios were taken on the host, so I do not believe there are any inherent timing issues inside the docker container.

The measurements for macOS show that there is not significant time lost to the docker process acting as a pipe, the debugger executable in docker is simply slower. When docker is run on linux, there is no virtualization. For linux containers on mac, the container runs on a virtualized linux kernel. I believe this virtualization is to blame for the performance impact.

The docker measurements on both linux on mac show that there is virtually no time lost to docker.exe acting as a pipe.

It is worth noting that I would expect all "Engine" measurements to be strictly less than the corresponding "Adapter" measurement. This is true in all cases except macOS docker. The negative deltas that exist are small, and I chalk this up to timer resolution in the virtualized environment.

Closing

Please let me know if you have any additional questions or ideas for analysis. Personally I do not have experience with performance profiling on linux or mac. I did not test something like running linux inside parallels or virtualbox on mac. I assumed this would obviously be slower and was not sure comparing it to the docker case would be interesting.

janaka commented 6 years ago

Thanks @chuckries this is very useful. We are having a discussion internally about your results and conclusions. Might have a more questions/comments in a bit.

chuckries commented 6 years ago

After letting this sit for a day, I realize that this sort of answers the question of "Why is vsdbg slow on docker for macOS?" with "because vsdbg is slow on docker for macOS." I haven't yet been able to actually point out what specifically is slower in docker. I'm going to continue to look into this and explore ways to measure performance. Unfortunately this won't be my primary task and I may not have any additional answers for a while.

If you have any ideas or experience with linux/docker perf investigation, I would appreciate the input!

rchande commented 6 years ago

@chuckries Nice analysis! One suggestion--I think the original post notes that this issue also reproduces if you debug a docker from windows. If you're more familiar with perf tools on Windows that might be a good way to get an actionable trace.

chuckries commented 6 years ago

@rchande Thanks for the feedback. I considered this, but this would still require linux perf investigation, as docker on windows would still be running a virtualized linux environment. One thing that I think I have shown is that the perf investigation needs to be done entirely within the docker container. There is not much to be gained from investigating on the host side.

rversaw commented 6 years ago

@chuckries Hey Chuck - thanks for all the work so far! From my perspective, some of what led to this issue for us is making sure we can have a "first class" development environment using Docker as opposed to the traditional setup (Windows + full VS). While the Windows side won't be as beneficial for tracing down the underlying reason for slowness in Docker, it could be a nice baseline to compare everything to. That would primarily be for our benefit in comparing though, and less about the overall issue itself.

chuckries commented 6 years ago

Oh hey Ryan! Yeah I'll try to get some Windows numbers so we cat at least see local windows vs docker on windows.

chuckries commented 6 years ago

Windows Data

Windows
Local Docker
iteration Engine Adapter Engine Adapter
1 479288 479536 4851175 4852634
2 512262 512498 4858059 4859198
3 488561 488953 4797241 4798359
4 484271 484535 4899063 4900303
5 509923 510162 4785430 4786509
6 503807 504068 4688882 4690064
7 482689 482995 4859941 4861161
8 498618 498891 4861770 4863226
9 498642 499017 4702413 4703720
10 517726 518042 4706336 4707903

image

On my machine, docker is about 10x slower than windows local. It may be worth noting that there is a larger difference between the windows and linux builds of the debugger than there is between linux and mac. Both linux and mac are built with clang, Windows is built with MSVC. Linux and Mac run on top of coreclr's Platform Adaptation Layer, Windows has no such abstraction. Etc.

Machine Specs

Windows:

OS Name Microsoft Windows 10 Enterprise
Processor Intel(R) Core(TM) i7-6700 CPU @ 3.40GHz, 3401 Mhz, 4 Core(s), 8 Logical Processor(s)
Installed Physical Memory (RAM) 32.0 GB

Docker:

CPUs 2
Memory 2048 MB
Swap 1024 MB

I tried increasing all Docker specs and saw only a negligible increase in performance.

chuckries commented 6 years ago

I have been pointed to this resource for perf investigation on linux: https://blogs.msdn.microsoft.com/vancem/2016/02/20/analyzing-cpu-traces-from-linux-with-perfview/

When I have time I will try to get some traces from inside docker on mac/windows to see if I can determine why the expansion is so much slower in docker.

chuckries commented 6 years ago

@janaka @jamiegs One thing that I meant to address is how this relates to Visual Studio vs VS Code. When debugging on docker, both VS and VS Code run the cross platform C# debugger "vsdbg" inside the container. The performance of both frontends should be similar for docker scenarios; they are both dependent on vsdbg inside the container.

janaka commented 6 years ago

@chuckries thanks for getting the Windows stats. "may be worth noting that there is a larger difference between the windows and linux builds of the debugger than there is between linux and mac." Does this explain the difference between native/local performance on Windows and macOS/Linux? It's one of the reasons we were interested in seeing the Windows stats.

How I/O bound is vsdbg? there seems to be some known limitations in the detault Docker for mac/win config. But there might be some options we can try. I'll share links to docs when at a keyboard.

chuckries commented 6 years ago

I will do my best to answer your questions, but until I can do the actual perfview analysis, these are best guesses.

  1. It all started on Windows. Both coreclr and vsdbg are ports of the Windows .NET Runtime and VS Debugger, respectively. These had been worked and tuned for years and years on Windows, and the Windows versions of the ports benefit from this. I do not know exactly how much pure perf work has been done for coreclr on *nix systems, but it is certainly more than has been done for vsdbg. That is why we're here now.

  2. The primary difference between the Windows and nix builds of vsdbg is that the nix versions use the coreclr Platform Adaptation Layer, or PAL. The Windows build does not use this layer because it builds directly against Windows API's. I have no proof that the PAL has such a high perf impact, but it likely has some.

  3. The .NET Debugging Interfaces likely have a higher cost on nix that Windows. Vsdbg is built on top of the .NET Debugger Interface. These are implemented by the runtime of the target application, not by vsdbg (specifically these are ship in mscordbi, part of the platform specific Microsoft.NETCore.App packages, like this). These APIs are what vsdbg uses to read memory out of the target process, among other things. It is likely that these code paths are not as performant on nix as they are at windows.

  4. Expression Evaluation requires a lot of ReadProcessMemory. It is my understanding that reading from another process's virtual memory space on nix boils down to ptrace. I have heard anecdotally that this is far slower on nix that it is on Windows. I hope to confirm/deny this once I do my analysis.

  5. Expression Evaluation should not be I/O bound. We load the symbol files (.pdb's) when the module instance loads. By the time we are expanding variables, we shouldn't be hitting the disk for anything. We are likely bound by reading memory from the target process, as described above.

chuckries commented 6 years ago

Just an FYI: I won't have a chance to do a more in depth perf analysis on this in the next week or two. I will add to this issue when I have more info/results.

codeimpossible commented 6 years ago

@chuckries In doing some more testing and troubleshooting with other devs it seems that the debug slowness can be improved by collapsing the "Variables" pane in vscode. Task<T> objects specifically seem to exacerbate the performance issue, even crashing vsdbg entirely in some scenarios.

chuckries commented 6 years ago

Update: I have not had time to test this, but there was a recent fix in the CoreCLR debugging interfaces that improves performance on XPLAT. I would expect this to improve Expression Evaluation performance both on Linux/Mac as well as Docker.

The specific commit is here. This is currently in coreclr master, meaning that it is only available in the latest .NET Core 3.0 Preview. I have confirmed that the fix is in the latest 3.0 SDK Shared Runtime available here.

janaka commented 5 years ago

@chuckries just catching up on this thread. Thanks for the update. Hopefully one of our crew will have a little time to test this out our end.

gregg-miskelly commented 5 years ago

https://github.com/dotnet/coreclr/issues/22315 is at least part of the problem as well

zarenner commented 4 years ago

I found some interesting differences related to this. I don't have measurements, but qualitatively: (all on a beefy new macbook pro)

I first thought that the issue was within hyperkit, but based on the last point it seems more likely that it's something to do with the (linuxkit) kernel/VM image Docker Desktop uses, compared to what minikube uses (buildroot).

zarenner commented 4 years ago

A couple additional tests, on Windows this time:

This further points to the issue being related to LinuxKit / the LinuxKit images used by both versions of Docker Desktop.

ssimek commented 4 years ago

Hi Everybody,

I may have found some explanations for the performance issues. Recently I decided to do development in an environment that is as production-like as possible, using the same containers used in production and just docker exec-ing vsdbg inside them for debugging. I immediately hit the issue described here - watches, stepping and the like suddenly took seconds (running on Docker Desktop for Mac).

The first thing I noticed was that during the expression evaluation "hang", the com.docker.hyperkit process pegs one CPU core, while running top inside the docker VM shows just a few percent for the vsdbg process.

The other one was the following in Console:

ptrace attach of "dotnet <project>.dll"[2451] was attempted by "/vsdbg/vsdbg --interpreter=vscode"[5554]

So it seems like vsdbg is not able to use ptrace at all! Playing around with it some more, I've found that the docker VM has kernel.yama.ptrace_scope = 1 set in sysctl, which effectively means that only descendants of a process may be ptrace-d (https://www.kernel.org/doc/Documentation/security/Yama.txt). Changing this to 0 is probably not practical (does not survive a reboot of the docker host VM which is locked down pretty tight with a read-only boot volume), but using docker exec --privileged for the vsdbg process is actually quite enough. After doing this, the above warning disappeared and debugging is (at least subjectively) even faster than regular local debugging on a Mac!

Not sure if my fix is applicable to other scenarios here, but I definitely hope it will help to move this issue forward.

vladkol commented 3 years ago

@ssimek This answer may help narrowing docker security changes down. docker run --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --security-opt apparmor=unconfined