elastic / beats

:tropical_fish: Beats - Lightweight shippers for Elasticsearch & Logstash
https://www.elastic.co/products/beats
Other
12.14k stars 4.91k forks source link

[Metricbeat] Metricbeat invokes ptrace leading to AppArmor complaints #6932

Closed kholia closed 5 years ago

kholia commented 6 years ago

I am running Metricbeat 5.6.8 insider a Docker container on Ubuntu 16.04.3 LTS. AppArmor's docker-default profile is in effect for this container.

While running dmesg on the host system, I see,

[63860.321629] audit: type=1400 audit(1524292320.220:29): apparmor="DENIED" operation="ptrace" profile="docker-default" pid=14754 comm="cat" requested_mask="trace" denied_mask="trace" peer="unconfined"
[63860.738883] audit: type=1400 audit(1524292320.636:30): apparmor="DENIED" operation="ptrace" profile="docker-default" pid=14787 comm="cat" requested_mask="trace" denied_mask="trace" peer="unconfined"
[63910.279007] audit: type=1400 audit(1524292370.176:31): apparmor="DENIED" operation="ptrace" profile="docker-default" pid=17951 comm="metricbeat" requested_mask="trace" denied_mask="trace" peer="unconfined"
[63910.279366] audit: type=1400 audit(1524292370.176:32): apparmor="DENIED" operation="ptrace" profile="docker-default" pid=17951 comm="metricbeat" requested_mask="trace" denied_mask="trace" peer="unconfined"
[63910.279420] audit: type=1400 audit(1524292370.176:33): apparmor="DENIED" operation="ptrace" profile="docker-default" pid=17951 comm="metricbeat" requested_mask="trace" denied_mask="trace" peer="unconfined"
[63910.279788] audit: type=1400 audit(1524292370.176:34): apparmor="DENIED" operation="ptrace" profile="docker-default" pid=17951 comm="metricbeat" requested_mask="trace" denied_mask="trace" peer="unconfined"
[63910.293403] audit: type=1400 audit(1524292370.192:35): apparmor="DENIED" operation="ptrace" profile="docker-default" pid=17974 comm="metricbeat" requested_mask="trace" denied_mask="trace" peer="unconfined"
[63910.293959] audit: type=1400 audit(1524292370.192:36): apparmor="DENIED" operation="ptrace" profile="docker-default" pid=17974 comm="metricbeat" requested_mask="trace" denied_mask="trace" peer="unconfined"
[63910.294385] audit: type=1400 audit(1524292370.192:37): apparmor="DENIED" operation="ptrace" profile="docker-default" pid=17974 comm="metricbeat" requested_mask="trace" denied_mask="trace" peer="unconfined"
[63910.294587] audit: type=1400 audit(1524292370.192:38): apparmor="DENIED" operation="ptrace" profile="docker-default" pid=17974 comm="metricbeat" requested_mask="trace" denied_mask="trace" peer="unconfined"
[63914.661900] audit: type=1400 audit(1524292374.560:39): apparmor="DENIED" operation="ptrace" profile="docker-default" pid=18802 comm="cat" requested_mask="trace" denied_mask="trace" peer="unconfined"
[63914.756281] audit: type=1400 audit(1524292374.656:40): apparmor="DENIED" operation="ptrace" profile="docker-default" pid=18832 comm="cat" requested_mask="trace" denied_mask="trace" peer="unconfined"
[63953.499491] audit: type=1400 audit(1524292413.396:41): apparmor="DENIED" operation="ptrace" profile="docker-default" pid=17975 comm="metricbeat" requested_mask="trace" denied_mask="trace" peer="unconfined"
[63953.499821] audit: type=1400 audit(1524292413.396:42): apparmor="DENIED" operation="ptrace" profile="docker-default" pid=17975 comm="metricbeat" requested_mask="trace" denied_mask="trace" peer="unconfined"
[63953.499891] audit: type=1400 audit(1524292413.396:43): apparmor="DENIED" operation="ptrace" profile="docker-default" pid=17975 comm="metricbeat" requested_mask="trace" denied_mask="trace" peer="unconfined"
[63953.505757] audit: type=1400 audit(1524292413.404:44): apparmor="DENIED" operation="ptrace" profile="docker-default" pid=17975 comm="metricbeat" requested_mask="trace" denied_mask="trace" peer="unconfined"
[63953.506204] audit: type=1400 audit(1524292413.404:45): apparmor="DENIED" operation="ptrace" profile="docker-default" pid=17975 comm="metricbeat" requested_mask="trace" denied_mask="trace" peer="unconfined"
[63953.506318] audit: type=1400 audit(1524292413.404:46): apparmor="DENIED" operation="ptrace" profile="docker-default" pid=17975 comm="metricbeat" requested_mask="trace" denied_mask="trace" peer="unconfined"
[63983.458929] audit: type=1400 audit(1524292443.356:47): apparmor="DENIED" operation="ptrace" profile="docker-default" pid=17975 comm="metricbeat" requested_mask="trace" denied_mask="trace" peer="unconfined"
[63983.459186] audit: type=1400 audit(1524292443.356:48): apparmor="DENIED" operation="ptrace" profile="docker-default" pid=17975 comm="metricbeat" requested_mask="trace" denied_mask="trace" peer="unconfined"
[63983.459250] audit: type=1400 audit(1524292443.356:49): apparmor="DENIED" operation="ptrace" profile="docker-default" pid=17975 comm="metricbeat" requested_mask="trace" denied_mask="trace" peer="unconfined"
[63983.461638] audit: type=1400 audit(1524292443.360:50): apparmor="DENIED" operation="ptrace" profile="docker-default" pid=17975 comm="metricbeat" requested_mask="trace" denied_mask="trace" peer="unconfined"
[63983.461914] audit: type=1400 audit(1524292443.360:51): apparmor="DENIED" operation="ptrace" profile="docker-default" pid=17975 comm="metricbeat" requested_mask="trace" denied_mask="trace" peer="unconfined"
[63983.461963] audit: type=1400 audit(1524292443.360:52): apparmor="DENIED" operation="ptrace" profile="docker-default" pid=17975 comm="metricbeat" requested_mask="trace" denied_mask="trace" peer="unconfined"

My configuration metricbeat.yml file,

metricbeat.modules:
- module: system # most metricsets, 30s
  metricsets:
    - cpu
    - diskio
    - load
    - memory
    - network
    - process
  process.cgroups.enabled: false
  period: 30s
- module: system # fsstat metricset, 5m
  metricsets:
    - fsstat
  period: 5m

processors:
- add_cloud_metadata:

logging.level: info
logging.to_files: true
logging.to_syslog: false
logging.files:
  path: /app/logs
  name: metricbeat.log

Disabling the process metricset gets rid of these AppArmor complaints.

How to reproduce:

Question: Why is metricbeat trying to ptrace stuff? It is required for some functionality?

Thanks!

andrewkroh commented 6 years ago

I have not witnessed any ptrace calls made by Metricbeat. I tried a few things to recreate it.

  1. Used your configuration inside of a container. docker run --user root --rm -it --security-opt apparmor=docker-default docker.elastic.co/beats/metricbeat:5.6.8. I didn't see apparmor messages in dmesg.

  2. Run Metricbeat 5.6.8-exec (downloaded from here) on the host machine using your config. I setup auditing to see if ptrace was used.

    -a always,exit -F arch=b32 -S ptrace
    -a always,exit -F arch=b64 -S ptrace
  3. I also tried running the latest Metricbeat from master with a seccomp profile that disallows ptrace to see if any violations were reported in audit logs.

  4. I analyzed the source code to see if ptrace was used anywhere in the system module. I didn't see it being used anywhere. The os/exec package is used within the diskio metricset and that package makes a reference to ptrace, but it's never used.

My environment: Versions: docker.elastic.co/beats/metricbeat:5.6.8, 5.6.8-exec, master OS: Ubuntu 16.04.3 LTS Kernel: 4.13.0-1006-gcp Docker: 18.03.0-ce

kholia commented 6 years ago

I looked further into this and I can now reproduce it on a local vanilla Ubuntu VM when running metricbeat inside a Docker container with AppArmor's docker-default profile applied.

Here is the exact command to reproduce the mentioned behaviour.

$ sudo docker run -v ~/metricbeat.yml:/usr/share/metricbeat/metricbeat.yml --volume=/proc:/hostfs/proc:ro --volume=/sys/fs/cgroup:/hostfs/sys/fs/cgroup:ro --volume=/:/hostfs:ro --net=host docker.elastic.co/beats/metricbeat:5.6.8 metricbeat -e -system.hostfs=/hostfs

Here is the complete configuration file,

metricbeat.modules:
- module: system # most metricsets, 30s
  metricsets:
    - cpu
    - diskio
    - load
    - memory
    - network
    - process
  # process.cgroups.enabled: false
  period: 30s
- module: system # fsstat metricset, 5m
  metricsets:
    - fsstat
  period: 5m

processors:
- add_cloud_metadata:

logging.level: info
logging.to_files: true
logging.to_syslog: false
logging.files:
  path: /tmp
  name: metricbeat.log

output.file:
  path: "/tmp/metricbeat"
  filename: metricbeat

I can reproduce this when using Docker version 17.12.0-ce, build c97c6d6 on Ubuntu 16.04.3 LTS.

Notes,

Make sure to disable auditd and auditbeat on the test system. This should ensure that the messages are visible through the dmesg command.

kholia commented 6 years ago

I see the following messages in metricbeat logs,

ERROR   instance/metrics.go:61  Error while getting memory usage: error retrieving process stats: cannot find matching process for pid=1
ERROR   instance/metrics.go:92  Error retrieving CPU percentages: error retrieving process stats: cannot find matching process for pid=1

I am not sure if these are relevant for the issue at hand.

ruflin commented 6 years ago

Not sure if it's related but it reminded me of https://github.com/elastic/beats/pull/6641 Would be interesting to see if you still see the issue with beats master?

andrewkroh commented 6 years ago

Metricbeat reads from /proc quite extensively to collect information about processes. And apparently procfs does some permission checks that involve ptrace so while Metricbeat never uses ptrace some violations are reported.

The AppArmor documentation has a page dedicated to /proc and ptrace. It mentions that reading from /proc/<pid>/cwd, /proc/<pid>/environ, and /proc/<pid>/exec cause ptrace read requests. So I think this must be cause of the "denied" messages.


Side note: I was able to reproduce the behavior by running Metricbeat with

docker run --name metricbeat --rm --security-opt apparmor=docker-default -v ~/metricbeat.yml:/usr/share/metricbeat/metricbeat.yml --volume=/proc:/hostfs/proc:ro --volume=/sys/fs/cgroup:/hostfs/sys/fs/cgroup:ro --volume=/:/hostfs:ro --net=host docker.elastic.co/beats/metricbeat:5.6.8 metricbeat -e -system.hostfs=/hostfs -strict.perms=false

And then I only started seeing the apparmor denied messages after running docker exec -it metricbeat bash.

kholia commented 6 years ago

The ptrace read requests are fine and not a problem.

My messages looks like,

apparmor="DENIED" operation="ptrace" comm="metricbeat" requested_mask="trace" denied_mask="trace"

A requested mask of read is expected (and is fine) but I am seeing a trace requested mask.

Does this imply that metricbeat is actually trying to ptrace a PID? I don't know if I am interpreting this correctly. Probably not!

kholia commented 6 years ago

And then I only started seeing the apparmor denied messages after running docker exec -it metricbeat bash.

Hmm. I start seeing the denied messages without doing this. I just have to wait < 1 minute before metricbeat starts doing its stuff.

andrewkroh commented 6 years ago

A requested mask of read is expected (and is fine) but I am seeing a trace requested mask.

Good point. The /proc and ptrace page also mentions that there are files in /proc that will trigger a ptrace trace request. But AFAIK Metricbeat doesn't touch any of those listed. Perhaps the lists are not 100% accurate or complete.

Does this imply that metricbeat is actually trying to ptrace a PID?

No. I don't believe this is the case. I have enabled auditing for the ptrace syscall and there were no hits. And the audit messages associated with the apparmor indicate that it's a read syscall (syscall=0) triggering the AVC.

type=AVC msg=audit(1524675112.626:70897): apparmor="DENIED" operation="ptrace" profile="docker-default" pid=3298 comm="metricbeat" requested_mask="trace" denied_mask="trace" peer="unconfined"
type=SYSCALL msg=audit(1524675112.626:70897): arch=c000003e syscall=0 success=yes exit=197 a0=5 a1=c4200c4000 a2=200 a3=0 items=0 ppid=3280 pid=3298 auid=4294967295 uid=1000 gid=1000 euid=1000 suid=1000 fsuid=1000 egid=1000 sgid=1000 fsgid=1000 tty=(none) ses=4294967295 comm="metricbeat" exe="/usr/share/metricbeat/metricbeat" key=(null)
type=PROCTITLE msg=audit(1524675112.626:70897): proctitle=6D657472696362656174002D65002D73797374656D2E686F737466733D2F686F73746673002D7374726963742E7065726D733D66616C7365002D640073797374656D2D70726F63657373
kholia commented 6 years ago

Nice findings 👍

Would it be possible to drop these "spurious" events with Auditbeat? I am wondering what the drop_event rule would look like.

Thanks.

andrewkroh commented 6 years ago

As of now Auditbeat is not fully parsing the AppArmor message. I have pull request open to fix the issue. Once that is merged you could filter the message like this:

auditbeat.modules:
- module: auditd
  processors:
  - drop_event.when:
      and:
      - equals.event.action: violated-apparmor-policy
      - equals.process.name: metricbeat
      - equals.auditd.data.operation: ptrace

But until then you could use:

auditbeat.modules:
- module: auditd
  processors:
  - drop_event.when:
      and:
      - equals.event.type: avc
      - equals.process.name: metricbeat
      - equals.auditd.data.syscall: read
mikero8 commented 6 years ago

We are experiencing same issue on our K8s cluster, we are running Metricbeat 6.2.4 as daemonset and captured the same error log.

kholia commented 6 years ago

@andrewkroh Hi again! I am unable to drop these AppArmor messages.

My Auditbeat 6.2.4 configuration contains the following lines,

...

processors:

- add_docker_metadata: ~  # match_fields stuff doesn't seem to be required to get "container.name" stuff working

- drop_event:
    when:
      # Drop AVC noise generated by the combination of AppArmor and Metricbeat
      # https://github.com/elastic/beats/issues/6932 (Metricbeat invokes ptrace leading to AppArmor complaints)
      and:
        - equals.event.type: avc
        - equals.process.name: metricbeat
        - equals.auditd.data.syscall: read
      or:
        - equals:
           process.exe: "/bin/mktemp"
        <many more rules>

However, these AppArmor complaints still make it to an Elasticsearch index. They don't get dropped by Auditbeat.

Note: The or rules are applied just fine and they work.

The full document in Elasticsearch look like,

{
  "_index": "auditbeat",
  "_type": "account",
  "_id": "FCqEd2MB2Psw3ckirLWg",
  "_version": 1,
  "_score": null,
  "_source": {
    "@timestamp": "2018-05-19T01:30:04.503Z",
    "@metadata": {
      "beat": "auditbeat",
      "type": "doc",
      "version": "6.2.4"
    },
    "process": {
      "ppid": "22771",
      "title": "./metricbeat -path.data /app/data/metricbeat -strict.perms=false -system.hostfs /hostfs -c /app/metricbeat.yml",
      "name": "metricbeat",
      "exe": "/xyz/metricbeat/metricbeat",
      "pid": "5539"
    },
    "auditd": {
      "sequence": 8812712,
      "result": "success",
      "session": "unset",
      "data": {
        "a1": "c4204c5600",
        "tty": "(none)",
        "a2": "200",
        "a3": "0",
        "syscall": "read",
        "a0": "9",
        "exit": "180",
        "arch": "x86_64"
      },
      "summary": {
        "actor": {
          "primary": "unset",
          "secondary": "root"
        },
        "how": "/xyz/metricbeat/metricbeat"
      }
    },
    "event": {
      "category": "mac-decision",
      "type": "avc",
      "action": "violated-selinux-policy",
      "module": "auditd"
    },
    "beat": {
      "name": "192.168.1.10",
      "hostname": "192.168.1.10",
      "version": "6.2.4"
    },
    "docker": {
      "container": {
        "image": "found/beats-runner:6.3.0-BC3",
        "name": "frc-beats-runner-beats-runner",
        "id": "dbdb2b0744e6be099d535eb91c06db75ba19a54fb5b6ce793c0c5ae75933f104",
        "labels": {
          "co": {
            "elastic": {
              "cloud": {
                "runner": {
                  "container_name": "beats-runner",
                  "container_set": "beats-runner",
                  "id": "i-032b0fcb42bdaf960",
                  "zone": "us-east-1a"
                }
              }
            }
          }
        }
      }
    },
    "user": {
      "name_map": {
        "uid": "root",
        "egid": "root",
        "euid": "root",
        "fsgid": "root",
        "fsuid": "root",
        "gid": "root",
        "sgid": "root",
        "suid": "root"
      },
      "gid": "0",
      "fsgid": "0",
      "suid": "0",
      "egid": "0",
      "sgid": "0",
      "euid": "0",
      "uid": "0",
      "auid": "unset",
      "fsuid": "0"
    }
  },
  "fields": {
    "@timestamp": [
      "2018-05-19T01:30:04.503Z"
    ]
  },
  "highlight": {
    "event.type": [
      "@kibana-highlighted-field@avc@/kibana-highlighted-field@"
    ]
  },
  "sort": [
    1526693404503
  ]
}

I tried simplifying the and conditional like below but still had no luck.

...
processors:

- add_docker_metadata: ~  # match_fields stuff doesn't seem to be required to get "container.name" stuff working

- drop_event:
    when:
      # Drop AVC noise generated by the combination of AppArmor and Metricbeat
      # https://github.com/elastic/beats/issues/6932 (Metricbeat invokes ptrace leading to AppArmor complaints)
      and:
        - equals.event.type: avc
      or:
        <old stuff>

What could I be missing here?

kholia commented 6 years ago

I was misunderstanding the semantics for drop_event it seems. The following configuration works,

- drop_event.when:
    # Drop all AVC events
    # equals.event.type: avc

    # Drop AVC noise generated by the combination of AppArmor and Metricbeat
    # https://github.com/elastic/beats/issues/6932 (Metricbeat invokes ptrace leading to AppArmor complaints)
    and:
      - equals.event.type: avc
      - equals.process.name: metricbeat
      - or:
          - equals.auditd.data.syscall: read
          - equals.auditd.data.syscall: readlinkat

- drop_event.when:
    # This gets rid of AVC messages which are super empty and spammy. I don't know why we get them.
    and:
      - equals.event.type: avc
      - equals.auditd.session: ""
      - equals.auditd.result: ""
      - equals.event.action: violated-selinux-policy

- drop_event:
    when:
      or: <same this as orginally posted>

I think I need multiple drop_event blocks.

andrewkroh commented 6 years ago

I think I need multiple drop_event blocks.

Yeah, I think that's correct. Combining a top-level and and or in the same condition seems to have undefined behavior. Looks like a bug to me.

luther7 commented 5 years ago

If you are encountering this issue in running Metricbeat in Kubernetes, you can circumvent it by adding an AppArmor annotation to your workload and adding the SYS_PTRACE capability to the security context of the workload's pod. For example:

apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
  name: metricbeat
  labels:
    app: metricbeat
    # SNIP
spec:
  template:
    metadata:
      name: metricbeat
      labels:
        app: metricbeat
        # SNIP
      annotations:
        # This is needed for the proc module to perform ptrace
        container.apparmor.security.beta.kubernetes.io/metricbeat: unconfined
    spec:
      # SNIP
      containers:
      - name: metricbeat
        # SNIP
        securityContext:
          runAsUser: 0
          capabilities:
            add:
            # This is needed for the proc module
            - SYS_PTRACE
fearful-symmetry commented 5 years ago

Presumably this isn't something that can be 'fixed' on the metricbeat side, as we're running into kernel behavior. Should we add a note to the metricset docs?

fearful-symmetry commented 5 years ago

Documentation regarding this has been added by#11822 Is there anything else we should do, or can I close this?

kholia commented 5 years ago

Thank you :+1:

glauberferreira commented 4 years ago

I was misunderstanding the semantics for drop_event it seems. The following configuration works,

- drop_event.when:
    # Drop all AVC events
    # equals.event.type: avc

    # Drop AVC noise generated by the combination of AppArmor and Metricbeat
    # https://github.com/elastic/beats/issues/6932 (Metricbeat invokes ptrace leading to AppArmor complaints)
    and:
      - equals.event.type: avc
      - equals.process.name: metricbeat
      - or:
          - equals.auditd.data.syscall: read
          - equals.auditd.data.syscall: readlinkat

- drop_event.when:
    # This gets rid of AVC messages which are super empty and spammy. I don't know why we get them.
    and:
      - equals.event.type: avc
      - equals.auditd.session: ""
      - equals.auditd.result: ""
      - equals.event.action: violated-selinux-policy

- drop_event:
    when:
      or: <same this as orginally posted>

I think I need multiple drop_event blocks.

@kholia I am running Metricbeat and Filebeat (7.4.0) on Docker (Ubuntu 18.04 LTS). Since these ptrace messages are being logged to my Syslogfile, Filebeat is sending all of them to Elasticsearch index. In order to 'fix' this, removing ptrace messages from Syslog, should I install Auditbeat and configure it to drop the corresponding events? Or maybe should be solved with the proper drop_event processor of Filebeat, based on the message content?

kholia commented 4 years ago

I drop these messages by installing and configuring Auditbeat.