Open janaka opened 6 years ago
@gregg-miskelly This sounds like it's probably a debugger issue?
@jamiegs this happens with VS2017 also right?
@janaka it does anecdotally, I haven't taken any timings of the performance though.
@janaka Are you using linux containers with docker on both windows and mac?
@chuckries yes
I want to let you guys know that I'm looking into this, but I haven't had a tremendous amount of time yet. I am trying to instrument our code to get some timing data on either side of the pipe (docker.exe in this case). I also have installed Ubuntu side by side on my macbook so that I can try to get some mac/linux numbers on the same hardware.
My guess upfront is that vsdbg is just slower running on the virtualized linux kernel in docker on mac. I will respond to this when I have some actual numbers. Thanks for reporting this!
Here are the results of my ad-hoc analysis. My conclusion: docker is not slow but virtualization is.
I instrumented the debugger to measure the elapsed time of response in two places:
With this instrumentation in place, I gathered elapsed time measurements by expanding "this" after stopping in HomeController.cs in the repro provided. I took these measurements on both macOS High Sierra 10.13.6 and Ubuntu 16.04. Both of these OS's were run natively on the same MacBook Pro hardware (dual boot). All tests were done using .NET Core SDK 2.1.302 (coreclr 2.1.2). I used a private release build of the C# Extension Debugger that was instrumented to collect elapsed time measurements as described.
The measurements below are in microseconds that it took to respond to the request to expand "this" when stopped in HomeController.cs.
Linux | Mac | |||||||
---|---|---|---|---|---|---|---|---|
Local | Docker | Local | Docker | |||||
Iteration | Engine | Adapter | Engine | Adapter | Engine | Adapter | Engine | Adapter |
1 | 2938358 | 2939101 | 2349848 | 2351742 | 3863562 | 3863906 | 5347215 | 5352754 |
2 | 2867511 | 2867852 | 2273766 | 2274889 | 3590077 | 3590630 | 5344905 | 5344100 |
3 | 2872347 | 2872741 | 2262749 | 2263813 | 3552246 | 3552546 | 6141752 | 6140130 |
4 | 2896274 | 2897103 | 2249261 | 2250578 | 3594334 | 3594625 | 5470241 | 5468553 |
5 | 2871539 | 2871859 | 2302355 | 2303903 | 3691909 | 3692372 | 6568443 | 6565845 |
6 | 2906964 | 2907353 | 2334716 | 2335732 | 3556107 | 3556393 | 5920160 | 5917727 |
7 | 2903448 | 2904032 | 2326317 | 2327477 | 3670246 | 3670553 | 5969643 | 5971757 |
8 | 2925795 | 2926287 | 2324877 | 2325811 | 3569883 | 3570178 | 6143094 | 6141412 |
9 | 2893591 | 2893953 | 2482863 | 2484262 | 3542470 | 3542790 | 5732719 | 5750978 |
10 | 2895258 | 2895718 | 2253518 | 2255403 | 3548777 | 3549094 | 5965903 | 5963876 |
This chart shows the average of the 10 iterations.
Docker is not inherently slower. My measurements show that docker on linux outperforms local linux debugging in the given setup. The Adapter measurements for both scenarios were taken on the host, so I do not believe there are any inherent timing issues inside the docker container.
The measurements for macOS show that there is not significant time lost to the docker process acting as a pipe, the debugger executable in docker is simply slower. When docker is run on linux, there is no virtualization. For linux containers on mac, the container runs on a virtualized linux kernel. I believe this virtualization is to blame for the performance impact.
The docker measurements on both linux on mac show that there is virtually no time lost to docker.exe acting as a pipe.
It is worth noting that I would expect all "Engine" measurements to be strictly less than the corresponding "Adapter" measurement. This is true in all cases except macOS docker. The negative deltas that exist are small, and I chalk this up to timer resolution in the virtualized environment.
Please let me know if you have any additional questions or ideas for analysis. Personally I do not have experience with performance profiling on linux or mac. I did not test something like running linux inside parallels or virtualbox on mac. I assumed this would obviously be slower and was not sure comparing it to the docker case would be interesting.
Thanks @chuckries this is very useful. We are having a discussion internally about your results and conclusions. Might have a more questions/comments in a bit.
After letting this sit for a day, I realize that this sort of answers the question of "Why is vsdbg slow on docker for macOS?" with "because vsdbg is slow on docker for macOS." I haven't yet been able to actually point out what specifically is slower in docker. I'm going to continue to look into this and explore ways to measure performance. Unfortunately this won't be my primary task and I may not have any additional answers for a while.
If you have any ideas or experience with linux/docker perf investigation, I would appreciate the input!
@chuckries Nice analysis! One suggestion--I think the original post notes that this issue also reproduces if you debug a docker from windows. If you're more familiar with perf tools on Windows that might be a good way to get an actionable trace.
@rchande Thanks for the feedback. I considered this, but this would still require linux perf investigation, as docker on windows would still be running a virtualized linux environment. One thing that I think I have shown is that the perf investigation needs to be done entirely within the docker container. There is not much to be gained from investigating on the host side.
@chuckries Hey Chuck - thanks for all the work so far! From my perspective, some of what led to this issue for us is making sure we can have a "first class" development environment using Docker as opposed to the traditional setup (Windows + full VS). While the Windows side won't be as beneficial for tracing down the underlying reason for slowness in Docker, it could be a nice baseline to compare everything to. That would primarily be for our benefit in comparing though, and less about the overall issue itself.
Oh hey Ryan! Yeah I'll try to get some Windows numbers so we cat at least see local windows vs docker on windows.
Windows | ||||
---|---|---|---|---|
Local | Docker | |||
iteration | Engine | Adapter | Engine | Adapter |
1 | 479288 | 479536 | 4851175 | 4852634 |
2 | 512262 | 512498 | 4858059 | 4859198 |
3 | 488561 | 488953 | 4797241 | 4798359 |
4 | 484271 | 484535 | 4899063 | 4900303 |
5 | 509923 | 510162 | 4785430 | 4786509 |
6 | 503807 | 504068 | 4688882 | 4690064 |
7 | 482689 | 482995 | 4859941 | 4861161 |
8 | 498618 | 498891 | 4861770 | 4863226 |
9 | 498642 | 499017 | 4702413 | 4703720 |
10 | 517726 | 518042 | 4706336 | 4707903 |
On my machine, docker is about 10x slower than windows local. It may be worth noting that there is a larger difference between the windows and linux builds of the debugger than there is between linux and mac. Both linux and mac are built with clang, Windows is built with MSVC. Linux and Mac run on top of coreclr's Platform Adaptation Layer, Windows has no such abstraction. Etc.
Windows:
OS Name | Microsoft Windows 10 Enterprise |
Processor | Intel(R) Core(TM) i7-6700 CPU @ 3.40GHz, 3401 Mhz, 4 Core(s), 8 Logical Processor(s) |
Installed Physical Memory (RAM) | 32.0 GB |
Docker:
CPUs | 2 |
Memory | 2048 MB |
Swap | 1024 MB |
I tried increasing all Docker specs and saw only a negligible increase in performance.
I have been pointed to this resource for perf investigation on linux: https://blogs.msdn.microsoft.com/vancem/2016/02/20/analyzing-cpu-traces-from-linux-with-perfview/
When I have time I will try to get some traces from inside docker on mac/windows to see if I can determine why the expansion is so much slower in docker.
@janaka @jamiegs One thing that I meant to address is how this relates to Visual Studio vs VS Code. When debugging on docker, both VS and VS Code run the cross platform C# debugger "vsdbg" inside the container. The performance of both frontends should be similar for docker scenarios; they are both dependent on vsdbg inside the container.
@chuckries thanks for getting the Windows stats. "may be worth noting that there is a larger difference between the windows and linux builds of the debugger than there is between linux and mac." Does this explain the difference between native/local performance on Windows and macOS/Linux? It's one of the reasons we were interested in seeing the Windows stats.
How I/O bound is vsdbg? there seems to be some known limitations in the detault Docker for mac/win config. But there might be some options we can try. I'll share links to docs when at a keyboard.
I will do my best to answer your questions, but until I can do the actual perfview analysis, these are best guesses.
It all started on Windows. Both coreclr and vsdbg are ports of the Windows .NET Runtime and VS Debugger, respectively. These had been worked and tuned for years and years on Windows, and the Windows versions of the ports benefit from this. I do not know exactly how much pure perf work has been done for coreclr on *nix systems, but it is certainly more than has been done for vsdbg. That is why we're here now.
The primary difference between the Windows and nix builds of vsdbg is that the nix versions use the coreclr Platform Adaptation Layer, or PAL. The Windows build does not use this layer because it builds directly against Windows API's. I have no proof that the PAL has such a high perf impact, but it likely has some.
The .NET Debugging Interfaces likely have a higher cost on nix that Windows. Vsdbg is built on top of the .NET Debugger Interface. These are implemented by the runtime of the target application, not by vsdbg (specifically these are ship in mscordbi, part of the platform specific Microsoft.NETCore.App packages, like this). These APIs are what vsdbg uses to read memory out of the target process, among other things. It is likely that these code paths are not as performant on nix as they are at windows.
Expression Evaluation requires a lot of ReadProcessMemory. It is my understanding that reading from another process's virtual memory space on nix boils down to ptrace. I have heard anecdotally that this is far slower on nix that it is on Windows. I hope to confirm/deny this once I do my analysis.
Expression Evaluation should not be I/O bound. We load the symbol files (.pdb's) when the module instance loads. By the time we are expanding variables, we shouldn't be hitting the disk for anything. We are likely bound by reading memory from the target process, as described above.
Just an FYI: I won't have a chance to do a more in depth perf analysis on this in the next week or two. I will add to this issue when I have more info/results.
@chuckries In doing some more testing and troubleshooting with other devs it seems that the debug slowness can be improved by collapsing the "Variables" pane in vscode. Task<T>
objects specifically seem to exacerbate the performance issue, even crashing vsdbg entirely in some scenarios.
Update: I have not had time to test this, but there was a recent fix in the CoreCLR debugging interfaces that improves performance on XPLAT. I would expect this to improve Expression Evaluation performance both on Linux/Mac as well as Docker.
The specific commit is here. This is currently in coreclr master, meaning that it is only available in the latest .NET Core 3.0 Preview. I have confirmed that the fix is in the latest 3.0 SDK Shared Runtime available here.
@chuckries just catching up on this thread. Thanks for the update. Hopefully one of our crew will have a little time to test this out our end.
https://github.com/dotnet/coreclr/issues/22315 is at least part of the problem as well
I found some interesting differences related to this. I don't have measurements, but qualitatively: (all on a beefy new macbook pro)
I first thought that the issue was within hyperkit, but based on the last point it seems more likely that it's something to do with the (linuxkit) kernel/VM image Docker Desktop uses, compared to what minikube uses (buildroot).
A couple additional tests, on Windows this time:
This further points to the issue being related to LinuxKit / the LinuxKit images used by both versions of Docker Desktop.
Hi Everybody,
I may have found some explanations for the performance issues. Recently I decided to do development in an environment that is as production-like as possible, using the same containers used in production and just docker exec-ing vsdbg inside them for debugging. I immediately hit the issue described here - watches, stepping and the like suddenly took seconds (running on Docker Desktop for Mac).
The first thing I noticed was that during the expression evaluation "hang", the com.docker.hyperkit process pegs one CPU core, while running top inside the docker VM shows just a few percent for the vsdbg process.
The other one was the following in Console:
ptrace attach of "dotnet <project>.dll"[2451] was attempted by "/vsdbg/vsdbg --interpreter=vscode"[5554]
So it seems like vsdbg is not able to use ptrace at all! Playing around with it some more, I've found that the docker VM has kernel.yama.ptrace_scope = 1
set in sysctl, which effectively means that only descendants of a process may be ptrace-d (https://www.kernel.org/doc/Documentation/security/Yama.txt). Changing this to 0 is probably not practical (does not survive a reboot of the docker host VM which is locked down pretty tight with a read-only boot volume), but using docker exec --privileged
for the vsdbg process is actually quite enough. After doing this, the above warning disappeared and debugging is (at least subjectively) even faster than regular local debugging on a Mac!
Not sure if my fix is applicable to other scenarios here, but I definitely hope it will help to move this issue forward.
@ssimek This answer may help narrowing docker security changes down.
docker run --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --security-opt apparmor=unconfined
There's a performance difference between native debugging and Docker container debugging. The problem can be observed when expanding variables in VS Code. In this test project https://github.com/jamiegs/dotnet-debugging-performance the difference between native and docker is small but still visible. On a real code base, it is much more significant. See this video https://youtu.be/V_jWtOHjvOg for a demonstration.
The details here are on macOS + VS Code. But we have observed the same on Window + VS Code and Windows + VS. Run on high spec Mac Book Pros and Dell XPS hardware. Changes to resources allocated to the container does not seem to make a difference.
Environment data
VS Code version:
1.25.1
C# Extension version:
1.15.2
Steps to reproduce
Based on code in https://github.com/jamiegs/dotnet-debugging-performance
There are two Launch Methods, Docker and Native.
Expected behavior
Actual behavior
Demo video comparison using https://github.com/jamiegs/dotnet-debugging-performance https://youtu.be/lkDnFPGOumM
Demo video comparison using a larger, 20,000 line application. https://youtu.be/V_jWtOHjvOg