giantswarm / roadmap

Giant Swarm Product Roadmap
https://github.com/orgs/giantswarm/projects/273
Apache License 2.0
3 stars 0 forks source link

CAPZ Flatcar images are reporting an error when loading audit rules #2122

Open primeroz opened 1 year ago

primeroz commented 1 year ago

Motivation

Follow up to https://github.com/giantswarm/roadmap/issues/1659

With our custom CAPZ Flatcar image there is an error reported by the audit-rules.service

➜ k node-shell fctest1-control-plane-c6bd2cc2-c5x9v
spawning "nsenter-c9nmgj" on "fctest1-control-plane-c6bd2cc2-c5x9v"
If you don't see a command prompt, try pressing enter.

fctest1-control-plane-c6bd2cc2-c5x9v / # systemctl list-units --failed
  UNIT                LOAD   ACTIVE SUB    DESCRIPTION                 
● audit-rules.service loaded failed failed Load Security Auditing Rules

it reports an error on line 5 ( can't show th elogs since they expired by now ) 

All rules seems to have loaded though

fctest1-control-plane-c6bd2cc2-c5x9v / # cat /etc/audit/audit.rules
## This file is automatically generated from /etc/audit/rules.d
-D

-a exclude,never -F msgtype>=1400 -F msgtype<=1499
-a exit,always -F arch=b64 -S execve -k auditing
-a exit,always -F arch=b32 -S execve -k auditing
-w /var/lib/containerd/ -p rwxa -k containerd
-w /etc/containerd/ -p rwxa -k containerd
-w /etc/systemd/system/containerd.service -p rwxa -k containerd
-w /etc/systemd/system/containerd.service.d/ -p rwxa -k containerd
-w /run/containerd/ -p rwxa -k containerd
-w /opt/bin/containerd-shim -p rwxa -k containerd
-w /opt/bin/containerd-shim-runc-v1 -p rwxa -k containerd
-w /opt/bin/containerd-shim-runc-v2 -p rwxa -k containerd
-w /opt/bin/runc -p rwxa -k containerd
-w /opt/bin/containerd -p rwxa -k containerd

fctest1-control-plane-c6bd2cc2-c5x9v / # auditctl -l 
-a always,exit -F arch=b64 -S execve -F key=auditing
-a always,exit -F arch=b32 -S execve -F key=auditing
-w /var/lib/containerd -p rwxa -k containerd
-w /etc/containerd -p rwxa -k containerd
-w /etc/systemd/system/containerd.service -p rwxa -k containerd
-w /etc/systemd/system/containerd.service.d -p rwxa -k containerd
-w /run/containerd -p rwxa -k containerd
-w /opt/bin/containerd-shim -p rwxa -k containerd
-w /opt/bin/containerd-shim-runc-v1 -p rwxa -k containerd
-w /opt/bin/containerd-shim-runc-v2 -p rwxa -k containerd
-w /opt/bin/runc -p rwxa -k containerd
-w /opt/bin/containerd -p rwxa -k containerd
-a never,exclude -F msgtype>=AVC -F msgtype<=1499

Also it doe snot happen 100% of the time on all nodes

TODO

For Turtles

tuladhar commented 1 year ago

Encountered on CAPA Flatcar image as well. It doesn't happen 100% of the time.

giantswarm@ip-10-0-214-124 ~ $ sudo systemctl status audit-rules
× audit-rules.service - Load Security Auditing Rules
     Loaded: loaded (/usr/lib/systemd/system/audit-rules.service; enabled; preset: enabled)
    Drop-In: /etc/systemd/system/audit-rules.service.d
             └─10-wait-for-containerd.conf
     Active: failed (Result: exit-code) since Wed 2023-08-30 17:00:01 UTC; 1h 0min ago
   Main PID: 1493 (code=exited, status=1/FAILURE)
        CPU: 23ms

Aug 30 17:00:00 localhost systemd[1]: Starting audit-rules.service...
Aug 30 17:00:01 localhost augenrules[1531]: Error sending add rule data request (Rule exists)
Aug 30 17:00:01 localhost augenrules[1531]: There was an error in line 5 of /etc/audit/audit.rules
Aug 30 17:00:01 localhost augenrules[1531]: No rules
Aug 30 17:00:01 localhost systemd[1]: audit-rules.service: Main process exited, code=exited, status=1/FAILURE
Aug 30 17:00:01 localhost systemd[1]: audit-rules.service: Failed with result 'exit-code'.
Aug 30 17:00:01 localhost systemd[1]: Failed to start audit-rules.service.

Files in /etc/audit/rules/d directory

ip-10-0-214-124 /etc/audit # ls -lh rules.d/
total 4.0K
lrwxrwxrwx. 1 root root  39 Aug 30 12:33 00-clear.rules -> /usr/share/audit/rules.d/00-clear.rules
lrwxrwxrwx. 1 root root  41 Aug 30 12:33 80-selinux.rules -> /usr/share/audit/rules.d/80-selinux.rules
lrwxrwxrwx. 1 root root  41 Aug 30 12:33 99-default.rules -> /usr/share/audit/rules.d/99-default.rules
-rw-r--r--. 1 root root 511 Aug 30 12:35 containerd.rules

My hutch is that containerd.rules file doesn't have a numeric prefix, so this file might be read before 00-clear.rules or after 99-default.rules. If it's after 99-default.rules, audit-rules.service doesn't fail, however if it's before 00-clear.rules then audit-rules.service fails 🤔 ❓

Note that audit-rules.service doesn't restart on failure, as it's Type=oneshot

tuladhar commented 1 year ago

In CAPA, this is fixed with adding Restart=on-failure to audit-rules.service

- name: audit-rules.service
  enabled: true
  dropins:
  - name: 10-wait-for-containerd.conf
    contents: |
      [Service]
      ExecStartPre=/bin/bash -c "while [ ! -f /etc/audit/rules.d/containerd.rules ]; do echo 'Waiting for /etc/audit/rules.d/containerd.rules to be written' && sleep 1; done"
      Restart=on-failure <-- Added this

See: https://github.com/giantswarm/cluster-aws/pull/334