dotnet / diagnostics

This repository contains the source code for various .NET Core runtime diagnostic tools and documents.
MIT License
1.17k stars 350 forks source link

Container Config - document seccomp profile needed to ensure .NET apps run #3445

Open akovac35 opened 1 year ago

akovac35 commented 1 year ago

Description

I am seeing the following problem when trying to dump the memory using the dotnet-monitor tool:

{"status":400,"detail":"Write dump failed - HRESULT: 0x00000000."}

After investigating a bit, and running ./usr/share/dotnet/shared/Microsoft.NETCore.App/6.0.9/createdump -d -v -n 1, I get:

[createdump] ffffffffff600000 - ffffffffff601000 (000001) 0000000000000000 --x-p- 21 [vsyscall]
[createdump] EnumerateElfInfo: phdr 0x562aef77e040 phnum 12
[createdump] ReadProcessMemory FAILED, addr: 0000562aef77e040, size: 56, ERRNO 38: Function not implemented
[createdump] ERROR: ReadMemory(0x562aef77e040, 38) phdr FAILED
[createdump] Target process is alive

May you please suggest what are the known requirements for the dump to work?

Configuration

Kubernetes

Linux 5.15.48-flatcar #1 SMP Tue Jun 21 05:55:04 -00 2022 x86_64 GNU/Linux

:/$ dotnet --info global.json file: Not found Host: Version: 6.0.9 Architecture: x64 Commit: 163a63591c .NET SDKs installed: No SDKs were found. .NET runtimes installed: Microsoft.AspNetCore.App 6.0.9 [/usr/share/dotnet/shared/Microsoft.AspNetCore.App] Microsoft.NETCore.App 6.0.9 [/usr/share/dotnet/shared/Microsoft.NETCore.App]
mikem8361 commented 1 year ago

/cc @hoyosjs

hoyosjs commented 1 year ago

@akovac35, what container image are you using and what SKU are you using for the node pool? This implies process_vm_read exists in the headers but returns not implemented.

akovac35 commented 1 year ago

Container images:

mcr.microsoft.com/dotnet/sdk:6.0-bullseye-slim AS builder

mcr.microsoft.com/dotnet/aspnet:6.0-bullseye-slim AS runner

Kubernetes info (a custom system):

kubectl describe nodes | select-string "OS Image:"

OS Image: Flatcar Container Linux by Kinvolk 3139.2.3 (Oklo) OS Image: Flatcar Container Linux by Kinvolk 3139.2.3 (Oklo) OS Image: Flatcar Container Linux by Kinvolk 3139.2.3 (Oklo) OS Image: Flatcar Container Linux by Kinvolk 3139.2.3 (Oklo)

The OS seems to be the following one:

https://flatcar-linux.org/releases#release-3139.2.3

I have tested the OS, including the latest image, in Hyper-V, with mcr.microsoft.com/dotnet/samples:aspnetapp-buster-slim-amd64, and everything worked.

@hoyosjs do you have any ideas how to continue troubleshooting? Anything specific from https://github.com/genuinetools/amicontained tool?

Why is ReadProcessMemory being called on Linux system, isn't this a Windows only thing?

akovac35 commented 1 year ago

@hoyosjs @mikem8361 Why is ReadProcessMemory being called on Linux system, isn't this a Windows only thing?

hoyosjs commented 1 year ago

OS Image: Flatcar Container Linux by Kinvolk 3139.2.3 (Oklo) OS Image: Flatcar Container Linux by Kinvolk 3139.2.3 (Oklo) OS Image: Flatcar Container Linux by Kinvolk 3139.2.3 (Oklo) OS Image: Flatcar Container Linux by Kinvolk 3139.2.3 (Oklo) The OS seems to be the following one:

https://flatcar-linux.org/releases#release-3139.2.3

If I use Flatcar Container Linux by Kinvolk 3227.2.2 (Oklo) I can reproduce the issue. That being said, it's not one of the supported OS's. I don't think I can easily bring this to shiproom for a backport. This was already solved for .NET 7/8. The issue is the syscall exists, but we get back ENOSYS.

@hoyosjs do you have any ideas how to continue troubleshooting? Anything specific from https://github.com/genuinetools/amicontained tool?

I am not familiar with such a tool. But essentially that's the issue - the function doesn't get implemented at all. There are really few ways of getting this working without the backport (LD_PRELOAD hooks is one way, but I am not sure that is a good way to do it).

Why is ReadProcessMemory being called on Linux system, isn't this a Windows only thing?

Our headers shim ReadProcessMemory - essentially, they translate the windows semantics of the call to OS APIs depending of the target.

akovac35 commented 1 year ago

@hoyosjs Thanks for the explanation.

So reading your reply, I decided to confirm on my laptop with Hyper-V. So I downloaded the latest Flatcar version (3227.2.2) and some random Docker image (https://hub.docker.com/r/allymartest/aspnetapp), and createdump works just fine with NET 3.1:

image

I also tried the full core dump, which also works.

Then I tested with NET 6.0.1 using another random container (https://hub.docker.com/layers/ignaciocolmenares/aspnetapp/alpine/images/sha256-bca0b347d866fac960587845f35e5b48821e2e3023cb4abdad4aade8ff81e4aa?context=explore), and it also works:

image

Is the error condition dependent on the application somehow?

hoyosjs commented 1 year ago

Where were you hosting the nodes of the original issue? I reproduced this with a C app directly on the flatcar image, so it's not really a .NET issue. It seems to be a VM configuration image (either in hosting or in the bits).

akovac35 commented 1 year ago

So, the issue is occurring on our Kubernetes test system which is hosted by a cloud provider. When everything works, this is on my laptop.

If I understand your latest reply correctly, you have reproduced the problem on a cloud system as well? Is this something you can use to figure out the requirements needed for the createdump to work?

hoyosjs commented 1 year ago

For 6.0, you need to have process_vm_read implemented (even if EPERM gets returned) and ptrace access on top of all the usual .NET requirements. ENOSYS, which is what you see, is also OK on 7.0+ runtimes.

And yes, my repro is on the cloud. I don't know how they host the VM but given that my image version is the same, the problem is most likely in the hosting (unless they build a different image for the cloud vs locally).

akovac35 commented 1 year ago

I will ask our admins if we can try tracing the createdump with the strace tool. This may provide some more information about what is being blocked, assuming the root access is still able to reproduce the problem:

# if containser has bash
docker exec -it -u root <container id> /bin/bash
# or use shell
docker exec -it -u root <container id> sh
apt-get update
apt install strace
strace -c -o createdump.log /usr/share/dotnet/shared/Microsoft.NETCore.App/6.0.9/createdump -d -v -n 1

Just to list system calls, on mcr.microsoft.com/dotnet/samples:aspnetapp (NET7 rc2 at the moment):

image

akovac35 commented 1 year ago

So, testing the problem on our test k8s, by simply adding the strace tool to the pod (NET6):

[createdump] ffffffffff600000 - ffffffffff601000 (000001) 0000000000000000 --x-p- 21 [vsyscall]
[createdump] EnumerateElfInfo: phdr 0x562ff9e49040 phnum 12
[createdump] ReadProcessMemory FAILED, addr: 0000562ff9e49040, size: 56, ERRNO 38: Function not implemented
[createdump] ERROR: ReadMemory(0x562ff9e49040, 38) phdr FAILED
[createdump] Target process is alive
nonroot-user@pod:/tmp$ cat createdump.log
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- -----------------
 79.26    0.041542          27      1494           write
  7.77    0.004071          16       249           read
  3.32    0.001742          19        88           ptrace
  2.67    0.001397          24        57        16 openat
  1.57    0.000823          17        48         9 newfstatat
  1.47    0.000772          19        40           mmap
  1.47    0.000769          17        44        22 wait4
  1.24    0.000652          15        41           close
  0.17    0.000091           9        10           mprotect
  0.16    0.000086          43         2           getdents64
  0.13    0.000070          17         4           pread64
  0.12    0.000061          15         4         4 process_vm_readv
  0.08    0.000044          44         1           readlink
  0.08    0.000041          41         1           statfs
  0.06    0.000031           6         5           brk
  0.06    0.000030          30         1           kill
  0.05    0.000025           8         3           prlimit64
  0.05    0.000024          12         2           rt_sigaction
  0.04    0.000021          21         1         1 access
  0.03    0.000017          17         1           pipe2
  0.03    0.000016          16         1           getsid
  0.03    0.000016           8         2         1 arch_prctl
  0.03    0.000016          16         1           set_tid_address
  0.03    0.000016          16         1           set_robust_list
  0.03    0.000016          16         1           rseq
  0.02    0.000013          13         1           gettid
  0.02    0.000012          12         1           getpid
  0.00    0.000000           0         1           munmap
  0.00    0.000000           0         1           execve
  0.00    0.000000           0         1           getrandom
------ ----------- ----------- --------- --------- -----------------
100.00    0.052414          24      2107        53 total

The number of errors for process_vm_readv is higher then when testing locally.

akovac35 commented 1 year ago

So the problem on our test Kubernetes system was the seccomp profile - all which was needed to get started was to update the policy file:

apiVersion: policy/v1beta1
kind: PodSecurityPolicy
metadata:
  # The three lines below were added to the policy
  annotations:
    seccomp.security.alpha.kubernetes.io/allowedProfileNames: runtime/default
    seccomp.security.alpha.kubernetes.io/defaultProfileName: runtime/default
  name: test-pod-policy
  ...

The policy may be further tuned and, the PodSecurityPolicy concept is deprecated in the latest k8s.

So the root cause of the problem is associated with Kubernetes config.

@hoyosjs Can you suggest a more granular seccomp, or similar? The NET team should have plenty of examples for this.

hoyosjs commented 1 year ago

@richlander @MichaelSimons, do we ever try to document seccomp/syscall filtering needs for our container stories? Linux in general could fall prey to this, but I don't think we've ever tried to document the surface area we use. seccomp and cap systems are usually something that's the onus of the system user, but I don't think we document the areas we use.

MichaelSimons commented 1 year ago

do we ever try to document seccomp/syscall filtering needs for our container stories?

I am not aware of any to date. That being said, this would be beneficial.

MichaelSimons commented 1 year ago

cc @mthalman

mthalman commented 1 year ago

I'm also not aware of any work that's gone into documenting a seccomp profile. I briefly did some experimentation with Docker Slim a while back which is capable of generating a seccomp profile for you based on the calls it detects during instrumentation. I actually found a bug on that which I logged: https://github.com/docker-slim/docker-slim/issues/182.