Open akovac35 opened 1 year ago
/cc @hoyosjs
@akovac35, what container image are you using and what SKU are you using for the node pool? This implies process_vm_read exists in the headers but returns not implemented.
Container images:
mcr.microsoft.com/dotnet/sdk:6.0-bullseye-slim AS builder
mcr.microsoft.com/dotnet/aspnet:6.0-bullseye-slim AS runner
Kubernetes info (a custom system):
kubectl describe nodes | select-string "OS Image:"
OS Image: Flatcar Container Linux by Kinvolk 3139.2.3 (Oklo) OS Image: Flatcar Container Linux by Kinvolk 3139.2.3 (Oklo) OS Image: Flatcar Container Linux by Kinvolk 3139.2.3 (Oklo) OS Image: Flatcar Container Linux by Kinvolk 3139.2.3 (Oklo)
The OS seems to be the following one:
https://flatcar-linux.org/releases#release-3139.2.3
I have tested the OS, including the latest image, in Hyper-V, with mcr.microsoft.com/dotnet/samples:aspnetapp-buster-slim-amd64, and everything worked.
@hoyosjs do you have any ideas how to continue troubleshooting? Anything specific from https://github.com/genuinetools/amicontained tool?
Why is ReadProcessMemory being called on Linux system, isn't this a Windows only thing?
@hoyosjs @mikem8361 Why is ReadProcessMemory being called on Linux system, isn't this a Windows only thing?
OS Image: Flatcar Container Linux by Kinvolk 3139.2.3 (Oklo) OS Image: Flatcar Container Linux by Kinvolk 3139.2.3 (Oklo) OS Image: Flatcar Container Linux by Kinvolk 3139.2.3 (Oklo) OS Image: Flatcar Container Linux by Kinvolk 3139.2.3 (Oklo) The OS seems to be the following one:
If I use Flatcar Container Linux by Kinvolk 3227.2.2 (Oklo)
I can reproduce the issue. That being said, it's not one of the supported OS's. I don't think I can easily bring this to shiproom for a backport. This was already solved for .NET 7/8. The issue is the syscall exists, but we get back ENOSYS.
@hoyosjs do you have any ideas how to continue troubleshooting? Anything specific from https://github.com/genuinetools/amicontained tool?
I am not familiar with such a tool. But essentially that's the issue - the function doesn't get implemented at all. There are really few ways of getting this working without the backport (LD_PRELOAD hooks is one way, but I am not sure that is a good way to do it).
Why is ReadProcessMemory being called on Linux system, isn't this a Windows only thing?
Our headers shim ReadProcessMemory
- essentially, they translate the windows semantics of the call to OS APIs depending of the target.
@hoyosjs Thanks for the explanation.
So reading your reply, I decided to confirm on my laptop with Hyper-V. So I downloaded the latest Flatcar version (3227.2.2) and some random Docker image (https://hub.docker.com/r/allymartest/aspnetapp), and createdump works just fine with NET 3.1:
I also tried the full core dump, which also works.
Then I tested with NET 6.0.1 using another random container (https://hub.docker.com/layers/ignaciocolmenares/aspnetapp/alpine/images/sha256-bca0b347d866fac960587845f35e5b48821e2e3023cb4abdad4aade8ff81e4aa?context=explore), and it also works:
Is the error condition dependent on the application somehow?
Where were you hosting the nodes of the original issue? I reproduced this with a C app directly on the flatcar image, so it's not really a .NET issue. It seems to be a VM configuration image (either in hosting or in the bits).
So, the issue is occurring on our Kubernetes test system which is hosted by a cloud provider. When everything works, this is on my laptop.
If I understand your latest reply correctly, you have reproduced the problem on a cloud system as well? Is this something you can use to figure out the requirements needed for the createdump
to work?
For 6.0, you need to have process_vm_read implemented (even if EPERM gets returned) and ptrace access on top of all the usual .NET requirements. ENOSYS, which is what you see, is also OK on 7.0+ runtimes.
And yes, my repro is on the cloud. I don't know how they host the VM but given that my image version is the same, the problem is most likely in the hosting (unless they build a different image for the cloud vs locally).
I will ask our admins if we can try tracing the createdump
with the strace
tool. This may provide some more information about what is being blocked, assuming the root access is still able to reproduce the problem:
# if containser has bash
docker exec -it -u root <container id> /bin/bash
# or use shell
docker exec -it -u root <container id> sh
apt-get update
apt install strace
strace -c -o createdump.log /usr/share/dotnet/shared/Microsoft.NETCore.App/6.0.9/createdump -d -v -n 1
Just to list system calls, on mcr.microsoft.com/dotnet/samples:aspnetapp (NET7 rc2 at the moment):
So, testing the problem on our test k8s, by simply adding the strace tool to the pod (NET6):
[createdump] ffffffffff600000 - ffffffffff601000 (000001) 0000000000000000 --x-p- 21 [vsyscall]
[createdump] EnumerateElfInfo: phdr 0x562ff9e49040 phnum 12
[createdump] ReadProcessMemory FAILED, addr: 0000562ff9e49040, size: 56, ERRNO 38: Function not implemented
[createdump] ERROR: ReadMemory(0x562ff9e49040, 38) phdr FAILED
[createdump] Target process is alive
nonroot-user@pod:/tmp$ cat createdump.log
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- -----------------
79.26 0.041542 27 1494 write
7.77 0.004071 16 249 read
3.32 0.001742 19 88 ptrace
2.67 0.001397 24 57 16 openat
1.57 0.000823 17 48 9 newfstatat
1.47 0.000772 19 40 mmap
1.47 0.000769 17 44 22 wait4
1.24 0.000652 15 41 close
0.17 0.000091 9 10 mprotect
0.16 0.000086 43 2 getdents64
0.13 0.000070 17 4 pread64
0.12 0.000061 15 4 4 process_vm_readv
0.08 0.000044 44 1 readlink
0.08 0.000041 41 1 statfs
0.06 0.000031 6 5 brk
0.06 0.000030 30 1 kill
0.05 0.000025 8 3 prlimit64
0.05 0.000024 12 2 rt_sigaction
0.04 0.000021 21 1 1 access
0.03 0.000017 17 1 pipe2
0.03 0.000016 16 1 getsid
0.03 0.000016 8 2 1 arch_prctl
0.03 0.000016 16 1 set_tid_address
0.03 0.000016 16 1 set_robust_list
0.03 0.000016 16 1 rseq
0.02 0.000013 13 1 gettid
0.02 0.000012 12 1 getpid
0.00 0.000000 0 1 munmap
0.00 0.000000 0 1 execve
0.00 0.000000 0 1 getrandom
------ ----------- ----------- --------- --------- -----------------
100.00 0.052414 24 2107 53 total
The number of errors for process_vm_readv
is higher then when testing locally.
So the problem on our test Kubernetes system was the seccomp profile - all which was needed to get started was to update the policy file:
apiVersion: policy/v1beta1
kind: PodSecurityPolicy
metadata:
# The three lines below were added to the policy
annotations:
seccomp.security.alpha.kubernetes.io/allowedProfileNames: runtime/default
seccomp.security.alpha.kubernetes.io/defaultProfileName: runtime/default
name: test-pod-policy
...
The policy may be further tuned and, the PodSecurityPolicy concept is deprecated in the latest k8s.
So the root cause of the problem is associated with Kubernetes config.
@hoyosjs Can you suggest a more granular seccomp, or similar? The NET team should have plenty of examples for this.
@richlander @MichaelSimons, do we ever try to document seccomp/syscall filtering needs for our container stories? Linux in general could fall prey to this, but I don't think we've ever tried to document the surface area we use. seccomp and cap systems are usually something that's the onus of the system user, but I don't think we document the areas we use.
do we ever try to document seccomp/syscall filtering needs for our container stories?
I am not aware of any to date. That being said, this would be beneficial.
cc @mthalman
I'm also not aware of any work that's gone into documenting a seccomp profile. I briefly did some experimentation with Docker Slim a while back which is capable of generating a seccomp profile for you based on the calls it detects during instrumentation. I actually found a bug on that which I logged: https://github.com/docker-slim/docker-slim/issues/182.
Description
I am seeing the following problem when trying to dump the memory using the dotnet-monitor tool:
{"status":400,"detail":"Write dump failed - HRESULT: 0x00000000."}
After investigating a bit, and running
./usr/share/dotnet/shared/Microsoft.NETCore.App/6.0.9/createdump -d -v -n 1
, I get:May you please suggest what are the known requirements for the dump to work?
Configuration
Kubernetes
Linux 5.15.48-flatcar #1 SMP Tue Jun 21 05:55:04 -00 2022 x86_64 GNU/Linux