dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
15.47k stars 4.76k forks source link

New/update dump diagnostic IPC command that streams the dump on diagnostic IPC channel #86350

Open jander-msft opened 1 year ago

jander-msft commented 1 year ago

If a diagnostic collection tool is running as a non-root user whereas the target application is running as root, the diagnostic tool will not be able to read the dump file because it is marked as read-write only by the root user. See https://github.com/dotnet/dotnet-monitor/issues/4427 as an example.

In the example, the deployment has to be updated such as .NET Monitor runs as root in order to be able to read the dump file that was written to the /diag/dumps/ directory. This is an okay but not as secure workaround, but not discoverable until someone actually tries to capture a dump and the tool gets an access denied error; other features (such as capturing traces) of .NET Monitor do not require this when it's running its own diagnostic server and has the target application connect to it via DOTNET_DiagnosticPorts because there is no file system interaction between the two processes.

I would like to request that a new dump command or new option be added to the diagnostic communication channel that allows for streaming the dump over the channel rather than requiring the diagnostic tool to read it from disk.

ghost commented 1 year ago

Tagging subscribers to this area: @tommcdon See info in area-owners.md if you want to be subscribed.

Issue Details
If a diagnostic collection tool is running as a non-root user whereas the target application is running as root, the diagnostic tool will not be able to read the dump file because it is marked as read-write only by the root user. See https://github.com/dotnet/dotnet-monitor/issues/4427 as an example. In the example, the deployment has to be updated such as .NET Monitor runs as root in order to be able to read the dump file that was written to the `/diag/dumps/` directory. This is an _okay_ but not as secure workaround, but not discoverable until someone actually tries to capture a dump and the tool gets an access denied error; other features (such as capturing traces) of .NET Monitor do not require this when it's running its own diagnostic server and has the target application connect to it via `DOTNET_DiagnosticPorts` because there is no file system interaction between the two processes. I would like to request that a new dump command or new option be added to the diagnostic communication channel that allows for streaming the dump over the channel rather than requiring the diagnostic tool to read it from disk.
Author: jander-msft
Assignees: -
Labels: `area-Diagnostics-coreclr`, `untriaged`
Milestone: -
tommcdon commented 1 year ago

cc @mikem8361

hoyosjs commented 1 year ago

The default is intentional and I am not convinced yet we want to change this - essentially the runtime writes the dump with the same ACLing as the process. When the runtime get a request over the pipe it assumes the ACLing of the pipe. Doing this would require either:

hoyosjs commented 1 year ago

This also doesn't seem to fit in the .NET 8 timeline...

jander-msft commented 1 year ago

The default is intentional and I am not convinced yet we want to change this

I'm not asking to change the existing behavior, I'm asking for an optional behavior for when invoking the creation of the dump via the ipc channel.

  • essentially the runtime writes the dump with the same ACLing as the process. When the runtime get a request over the pipe it assumes the ACLing of the pipe.

The ACLing of the pipe does not necessarily match the target process, especially in "reverse server" diagnostics.

We (.NET Monitor) are only shipping non-root distroless images for .NET 8 for a better security posture. One core deployment type for .NET Monitor is to be used in multi-container environments; this means we have to use "reverse server" diagnostics (.NET Monitor has a diagnostic server to which the target application must connect) to be able to observe more than one container. The vast majority of application containers are still running as root; they are able to connect to the server due to their elevated access but they definitely do not match strictly match the permissions of the socket.

  • Having the runtime stream after collection, which is problematic if you are within a signal handler.

Not sure what a signal handle has to do with this code path; the app is not crashing or being forced to shutdown. Unless the handler of the diagnostic command is issuing some special signal to cause the dump invocation routine to execute... even it was...

  • Teach createdump how to open an EventPipe session to talk back to the collecting tool, and this quickly gets messy...

That's not how I would expect this to work; I would expect the command handler to invoke the dump routine, get a handle to the file, read out the file and send the content back through the response, something akin to how tracing is sent back as an unbounded stream in its response.

hoyosjs commented 1 year ago

I understand this changes a bit in the reverse scenario - and the ACL is not as much as a concern since it's an explicit config. However there's two major scenarios for dump collection: Crash and ad-hoc. The former gets triggered via a signal handler construct and spawns a second process. We don't get to stream that one since the runtime has limited actions it can perform in such cases - so we write it to a file from a different process (and there's also the usual thread suspension et al that limits how things are done). In the monitor scenario what's the egress story for such things? My guess is network mounted volume? If so a temporary workaround for people to get their dumps is to request ad-hoc dumps to said directory.

As for the ad-hoc dumps - yes, the runtime shares the codepath of dump collection for EP requests and crashing scenarios. It's reasonable to say that the command handler perhaps could stream the file and that would work, I do have to give it some thought around clients potentially listening to a lower ACL pipe where you are sending all the data of a privileged process, even if the intention is coming from a desire to run diagnostic tooling with the lowest levels of privilege possible and we have explicit opt-in. Also, likely we wouldn't support generating the JSON report, nor diagnostic logging around log collection (at least not to a file).

cc: @samsp-msft @richlander

jander-msft commented 1 year ago

I'm not concerned about crash dump, that can continue to write out the file system as configured by the env vars. The diagnostic IPC has nothing to do with that.

.NET Monitor currently doesn't collect crash dumps; we'd recommend using the env vars if customers need that today. Even if it did, we'd have the same issue as described here (unable to read the dump file).

We also have a separate story that we'll work on eventually where we'll detect the unhandled exception before it crashes the process, hold the process, capture a dump, and egress it out of the container/pod, and let the process crash; this also has no bearing on the ask in this issue.

Regarding the json report and logs, .NET Monitor currently doesn't ask for those to be generated. Even if it did, again, we'd still be stuck because we can't read it in this scenario. We haven't had any demand for these artifacts in .NET Monitor usage, so I'm not too concerned about those. The effort I would put for those would be more like a thought experiment, unless we get a customer ask for it.

jander-msft commented 1 year ago

An alternative is that createdump is changed to allow group read permissions on all generated artifacts. We can then use Kubernetes' fsgroup security context concept in order to allow containers in the pod to have permission to read the files via the secondary group specified by fsgroup. We'll log a separate issue after investigating this more.