Closed kholia closed 5 years ago
I have not witnessed any ptrace
calls made by Metricbeat. I tried a few things to recreate it.
Used your configuration inside of a container. docker run --user root --rm -it --security-opt apparmor=docker-default docker.elastic.co/beats/metricbeat:5.6.8
. I didn't see apparmor messages in dmesg.
Run Metricbeat 5.6.8-exec (downloaded from here) on the host machine using your config. I setup auditing to see if ptrace
was used.
-a always,exit -F arch=b32 -S ptrace
-a always,exit -F arch=b64 -S ptrace
I also tried running the latest Metricbeat from master with a seccomp profile that disallows ptrace
to see if any violations were reported in audit logs.
I analyzed the source code to see if ptrace
was used anywhere in the system module. I didn't see it being used anywhere. The os/exec
package is used within the diskio
metricset and that package makes a reference to ptrace
, but it's never used.
My environment: Versions: docker.elastic.co/beats/metricbeat:5.6.8, 5.6.8-exec, master OS: Ubuntu 16.04.3 LTS Kernel: 4.13.0-1006-gcp Docker: 18.03.0-ce
I looked further into this and I can now reproduce it on a local vanilla Ubuntu VM when running metricbeat
inside a Docker container with AppArmor's docker-default
profile applied.
Here is the exact command to reproduce the mentioned behaviour.
$ sudo docker run -v ~/metricbeat.yml:/usr/share/metricbeat/metricbeat.yml --volume=/proc:/hostfs/proc:ro --volume=/sys/fs/cgroup:/hostfs/sys/fs/cgroup:ro --volume=/:/hostfs:ro --net=host docker.elastic.co/beats/metricbeat:5.6.8 metricbeat -e -system.hostfs=/hostfs
Here is the complete configuration file,
metricbeat.modules:
- module: system # most metricsets, 30s
metricsets:
- cpu
- diskio
- load
- memory
- network
- process
# process.cgroups.enabled: false
period: 30s
- module: system # fsstat metricset, 5m
metricsets:
- fsstat
period: 5m
processors:
- add_cloud_metadata:
logging.level: info
logging.to_files: true
logging.to_syslog: false
logging.files:
path: /tmp
name: metricbeat.log
output.file:
path: "/tmp/metricbeat"
filename: metricbeat
I can reproduce this when using Docker version 17.12.0-ce, build c97c6d6
on Ubuntu 16.04.3 LTS.
Notes,
Upgrading to Docker version 18.03.0-ce, build 0520e24
doesn't make a difference.
Upgrading to Docker version 17.12.1-ce, build 7390fc
doesn't make a difference.
Upgrading to Metricbeat 6.2.4 doesn't make these AppArmor complaints go away.
I even tested on upcoming Ubuntu 18.04 LTS and I can still see these weird AppArmor complaints.
Disabling process
metricset makes these complaints go away. This roughly indicates that the process metricset
code is somehow related to these AppArmor complaints.
I didn't have much luck in using gdb
to break on ptrace
for metricbeat
process.
I have used a wide variety of kernels in my testing (ranging from 4.4 to 4.15). So this problem shouldn't be related to any kernel stuff (hopefully).
I downloaded the various Docker versions for testing from here.
Make sure to disable auditd
and auditbeat
on the test system. This should ensure that the messages are visible through the dmesg
command.
I see the following messages in metricbeat logs,
ERROR instance/metrics.go:61 Error while getting memory usage: error retrieving process stats: cannot find matching process for pid=1
ERROR instance/metrics.go:92 Error retrieving CPU percentages: error retrieving process stats: cannot find matching process for pid=1
I am not sure if these are relevant for the issue at hand.
Not sure if it's related but it reminded me of https://github.com/elastic/beats/pull/6641 Would be interesting to see if you still see the issue with beats master?
Metricbeat reads from /proc
quite extensively to collect information about processes. And apparently procfs does some permission checks that involve ptrace
so while Metricbeat never uses ptrace
some violations are reported.
The AppArmor documentation has a page dedicated to /proc and ptrace. It mentions that reading from /proc/<pid>/cwd
, /proc/<pid>/environ
, and /proc/<pid>/exec
cause ptrace read requests. So I think this must be cause of the "denied" messages.
Side note: I was able to reproduce the behavior by running Metricbeat with
docker run --name metricbeat --rm --security-opt apparmor=docker-default -v ~/metricbeat.yml:/usr/share/metricbeat/metricbeat.yml --volume=/proc:/hostfs/proc:ro --volume=/sys/fs/cgroup:/hostfs/sys/fs/cgroup:ro --volume=/:/hostfs:ro --net=host docker.elastic.co/beats/metricbeat:5.6.8 metricbeat -e -system.hostfs=/hostfs -strict.perms=false
And then I only started seeing the apparmor denied messages after running docker exec -it metricbeat bash
.
The ptrace read
requests are fine and not a problem.
My messages looks like,
apparmor="DENIED" operation="ptrace" comm="metricbeat" requested_mask="trace" denied_mask="trace"
A requested mask of read
is expected (and is fine) but I am seeing a trace
requested mask.
Does this imply that metricbeat is actually trying to ptrace a PID? I don't know if I am interpreting this correctly. Probably not!
And then I only started seeing the apparmor denied messages after running docker exec -it metricbeat bash.
Hmm. I start seeing the denied messages without doing this. I just have to wait < 1 minute before metricbeat starts doing its stuff.
A requested mask of read is expected (and is fine) but I am seeing a trace requested mask.
Good point. The /proc and ptrace page also mentions that there are files in /proc
that will trigger a ptrace
trace
request. But AFAIK Metricbeat doesn't touch any of those listed. Perhaps the lists are not 100% accurate or complete.
Does this imply that metricbeat is actually trying to ptrace a PID?
No. I don't believe this is the case. I have enabled auditing for the ptrace
syscall and there were no hits. And the audit messages associated with the apparmor indicate that it's a read
syscall (syscall=0
) triggering the AVC.
type=AVC msg=audit(1524675112.626:70897): apparmor="DENIED" operation="ptrace" profile="docker-default" pid=3298 comm="metricbeat" requested_mask="trace" denied_mask="trace" peer="unconfined"
type=SYSCALL msg=audit(1524675112.626:70897): arch=c000003e syscall=0 success=yes exit=197 a0=5 a1=c4200c4000 a2=200 a3=0 items=0 ppid=3280 pid=3298 auid=4294967295 uid=1000 gid=1000 euid=1000 suid=1000 fsuid=1000 egid=1000 sgid=1000 fsgid=1000 tty=(none) ses=4294967295 comm="metricbeat" exe="/usr/share/metricbeat/metricbeat" key=(null)
type=PROCTITLE msg=audit(1524675112.626:70897): proctitle=6D657472696362656174002D65002D73797374656D2E686F737466733D2F686F73746673002D7374726963742E7065726D733D66616C7365002D640073797374656D2D70726F63657373
Nice findings 👍
Would it be possible to drop these "spurious" events with Auditbeat? I am wondering what the drop_event
rule would look like.
Thanks.
As of now Auditbeat is not fully parsing the AppArmor message. I have pull request open to fix the issue. Once that is merged you could filter the message like this:
auditbeat.modules:
- module: auditd
processors:
- drop_event.when:
and:
- equals.event.action: violated-apparmor-policy
- equals.process.name: metricbeat
- equals.auditd.data.operation: ptrace
But until then you could use:
auditbeat.modules:
- module: auditd
processors:
- drop_event.when:
and:
- equals.event.type: avc
- equals.process.name: metricbeat
- equals.auditd.data.syscall: read
We are experiencing same issue on our K8s cluster, we are running Metricbeat 6.2.4 as daemonset and captured the same error log.
@andrewkroh Hi again! I am unable to drop these AppArmor messages.
My Auditbeat 6.2.4 configuration contains the following lines,
...
processors:
- add_docker_metadata: ~ # match_fields stuff doesn't seem to be required to get "container.name" stuff working
- drop_event:
when:
# Drop AVC noise generated by the combination of AppArmor and Metricbeat
# https://github.com/elastic/beats/issues/6932 (Metricbeat invokes ptrace leading to AppArmor complaints)
and:
- equals.event.type: avc
- equals.process.name: metricbeat
- equals.auditd.data.syscall: read
or:
- equals:
process.exe: "/bin/mktemp"
<many more rules>
However, these AppArmor complaints still make it to an Elasticsearch index. They don't get dropped by Auditbeat.
Note: The or
rules are applied just fine and they work.
The full document in Elasticsearch look like,
{
"_index": "auditbeat",
"_type": "account",
"_id": "FCqEd2MB2Psw3ckirLWg",
"_version": 1,
"_score": null,
"_source": {
"@timestamp": "2018-05-19T01:30:04.503Z",
"@metadata": {
"beat": "auditbeat",
"type": "doc",
"version": "6.2.4"
},
"process": {
"ppid": "22771",
"title": "./metricbeat -path.data /app/data/metricbeat -strict.perms=false -system.hostfs /hostfs -c /app/metricbeat.yml",
"name": "metricbeat",
"exe": "/xyz/metricbeat/metricbeat",
"pid": "5539"
},
"auditd": {
"sequence": 8812712,
"result": "success",
"session": "unset",
"data": {
"a1": "c4204c5600",
"tty": "(none)",
"a2": "200",
"a3": "0",
"syscall": "read",
"a0": "9",
"exit": "180",
"arch": "x86_64"
},
"summary": {
"actor": {
"primary": "unset",
"secondary": "root"
},
"how": "/xyz/metricbeat/metricbeat"
}
},
"event": {
"category": "mac-decision",
"type": "avc",
"action": "violated-selinux-policy",
"module": "auditd"
},
"beat": {
"name": "192.168.1.10",
"hostname": "192.168.1.10",
"version": "6.2.4"
},
"docker": {
"container": {
"image": "found/beats-runner:6.3.0-BC3",
"name": "frc-beats-runner-beats-runner",
"id": "dbdb2b0744e6be099d535eb91c06db75ba19a54fb5b6ce793c0c5ae75933f104",
"labels": {
"co": {
"elastic": {
"cloud": {
"runner": {
"container_name": "beats-runner",
"container_set": "beats-runner",
"id": "i-032b0fcb42bdaf960",
"zone": "us-east-1a"
}
}
}
}
}
}
},
"user": {
"name_map": {
"uid": "root",
"egid": "root",
"euid": "root",
"fsgid": "root",
"fsuid": "root",
"gid": "root",
"sgid": "root",
"suid": "root"
},
"gid": "0",
"fsgid": "0",
"suid": "0",
"egid": "0",
"sgid": "0",
"euid": "0",
"uid": "0",
"auid": "unset",
"fsuid": "0"
}
},
"fields": {
"@timestamp": [
"2018-05-19T01:30:04.503Z"
]
},
"highlight": {
"event.type": [
"@kibana-highlighted-field@avc@/kibana-highlighted-field@"
]
},
"sort": [
1526693404503
]
}
I tried simplifying the and
conditional like below but still had no luck.
...
processors:
- add_docker_metadata: ~ # match_fields stuff doesn't seem to be required to get "container.name" stuff working
- drop_event:
when:
# Drop AVC noise generated by the combination of AppArmor and Metricbeat
# https://github.com/elastic/beats/issues/6932 (Metricbeat invokes ptrace leading to AppArmor complaints)
and:
- equals.event.type: avc
or:
<old stuff>
What could I be missing here?
I was misunderstanding the semantics for drop_event
it seems. The following configuration works,
- drop_event.when:
# Drop all AVC events
# equals.event.type: avc
# Drop AVC noise generated by the combination of AppArmor and Metricbeat
# https://github.com/elastic/beats/issues/6932 (Metricbeat invokes ptrace leading to AppArmor complaints)
and:
- equals.event.type: avc
- equals.process.name: metricbeat
- or:
- equals.auditd.data.syscall: read
- equals.auditd.data.syscall: readlinkat
- drop_event.when:
# This gets rid of AVC messages which are super empty and spammy. I don't know why we get them.
and:
- equals.event.type: avc
- equals.auditd.session: ""
- equals.auditd.result: ""
- equals.event.action: violated-selinux-policy
- drop_event:
when:
or: <same this as orginally posted>
I think I need multiple drop_event
blocks.
I think I need multiple drop_event blocks.
Yeah, I think that's correct. Combining a top-level and
and or
in the same condition seems to have undefined behavior. Looks like a bug to me.
If you are encountering this issue in running Metricbeat in Kubernetes, you can circumvent it by adding an AppArmor annotation to your workload and adding the SYS_PTRACE capability to the security context of the workload's pod. For example:
apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
name: metricbeat
labels:
app: metricbeat
# SNIP
spec:
template:
metadata:
name: metricbeat
labels:
app: metricbeat
# SNIP
annotations:
# This is needed for the proc module to perform ptrace
container.apparmor.security.beta.kubernetes.io/metricbeat: unconfined
spec:
# SNIP
containers:
- name: metricbeat
# SNIP
securityContext:
runAsUser: 0
capabilities:
add:
# This is needed for the proc module
- SYS_PTRACE
Presumably this isn't something that can be 'fixed' on the metricbeat side, as we're running into kernel behavior. Should we add a note to the metricset docs?
Documentation regarding this has been added by#11822 Is there anything else we should do, or can I close this?
Thank you :+1:
I was misunderstanding the semantics for
drop_event
it seems. The following configuration works,- drop_event.when: # Drop all AVC events # equals.event.type: avc # Drop AVC noise generated by the combination of AppArmor and Metricbeat # https://github.com/elastic/beats/issues/6932 (Metricbeat invokes ptrace leading to AppArmor complaints) and: - equals.event.type: avc - equals.process.name: metricbeat - or: - equals.auditd.data.syscall: read - equals.auditd.data.syscall: readlinkat - drop_event.when: # This gets rid of AVC messages which are super empty and spammy. I don't know why we get them. and: - equals.event.type: avc - equals.auditd.session: "" - equals.auditd.result: "" - equals.event.action: violated-selinux-policy - drop_event: when: or: <same this as orginally posted>
I think I need multiple
drop_event
blocks.
@kholia I am running Metricbeat and Filebeat (7.4.0) on Docker (Ubuntu 18.04 LTS). Since these ptrace
messages are being logged to my Syslog
file, Filebeat is sending all of them to Elasticsearch index. In order to 'fix' this, removing ptrace
messages from Syslog
, should I install Auditbeat and configure it to drop the corresponding events? Or maybe should be solved with the proper drop_event
processor of Filebeat, based on the message content?
I drop these messages by installing and configuring Auditbeat.
I am running Metricbeat 5.6.8 insider a Docker container on Ubuntu 16.04.3 LTS. AppArmor's
docker-default
profile is in effect for this container.While running
dmesg
on the host system, I see,My configuration
metricbeat.yml
file,Disabling the
process
metricset gets rid of these AppArmor complaints.How to reproduce:
4.4.0-108-generic
kernel here.metricbeat version 5.6.8-xexec (amd64), libbeat 5.6.8-xexe
)Docker version 17.12.0-ce, build c97c6d6
docker-default
profile applied.Question: Why is
metricbeat
trying to ptrace stuff? It is required for some functionality?Thanks!