draios / sysdig

Linux system exploration and troubleshooting tool with first class support for containers
http://www.sysdig.com/
Other
7.72k stars 726 forks source link

Omit SIGKILL event in centos7 with kernel 3.10.0 #2000

Open liuyaqiu opened 1 year ago

liuyaqiu commented 1 year ago

I think sysdig may omit some event in centos7 with kernel 3.10.0

Runtime Information

System: CentOS Linux release 7.9.2009 (Core)

Kernel: 3.10.0-1160.el7.x86_64

Sysdig: 0.35.1

Docker:

Client: Docker Engine - Community
 Version:           20.10.23
 API version:       1.40
 Go version:        go1.18.10
 Git commit:        7155243
 Built:             Thu Jan 19 17:36:21 2023
 OS/Arch:           linux/amd64
 Context:           default
 Experimental:      true

Server: Docker Engine - Community
 Engine:
  Version:          19.03.15
  API version:      1.40 (minimum version 1.12)
  Go version:       go1.13.15
  Git commit:       99e3ed8919
  Built:            Sat Jan 30 03:16:33 2021
  OS/Arch:          linux/amd64
  Experimental:     true
 containerd:
  Version:          1.6.15
  GitCommit:        5b842e528e99d4d4c1686467debf2bd4b88ecd86
 runc:
  Version:          1.0.3
  GitCommit:        v1.0.3-0-gf46b6ba
 docker-init:
  Version:          0.18.0
  GitCommit:        fec3683

Problem

I want to detect SIGKILL signal event inside container. So I run the below command on the host machine.

sysdig evt.type=kill and evt.arg.sig=SIGKILL

Below is the output.

kill inside container

image

Then I execute a kill command inside a k8s docker container located at this host machine. image

I cannot see the kill command's system call event in the first picture.

kill inside host machine

image

image

Obviously, I still cannot see the kill command's event.

Conclusion

I think sysdig cannot work well in the centos7 with kernel 3.10.0. Or how can I make it work well?

therealbobo commented 1 year ago

Hi @liuyaqiu! I just tried but I cannot reproduce the issue with docker. Could you give a try with docker only? Has the host any particular configuration? Is the host on high load?

liuyaqiu commented 1 year ago

The host machine has 256 CPU cores and the load average is about 60. I don't know whether it is too high. image

image

therealbobo commented 1 year ago

Does this issue happen all the time?

liuyaqiu commented 1 year ago

Does this issue happen all the time?

Difference to previous situation

  1. host machine load average is lower than previous
  2. sysdig package is removed from host machine.
  3. sysdig running inside docker container(use official image sysdig/sysdig:0.31.5

Now the host machine load average is low: image

And I can see the SIGKILL events in time, sysdig works well. I write a python script to sleep and kill it self:

import os
import signal
import time

print("My PID is:", os.getpid())

# Sleep for 10 seconds
time.sleep(10)

# Kill self with SIGKILL
os.kill(os.getpid(), signal.SIGKILL)

I run this script inside another container. image

The sysdig run also run inside a container while the sysdig package is removed from the host machine.

sysdig evt.type=kill and evt.arg.sig=SIGKILL

image The sender pid is in host machine's root namespace, the receiver pid is in container's namespace.

@therealbobo Now everything looks working well. Thanks for your help. And I will still pay attention to this problem to see whether it reproduces.

therealbobo commented 1 year ago

The only thing I could think of (other than a bug in the drivers) is that sysdig is dropping events, probably due to the syscall buffer being too small. Currently sysdig doesn't support a variable size ring buffer. I'll work on it. Please ping me if the problem shows up on low load. Thank you to bring up this to our attention! @liuyaqiu 😄