ComplianceAsCode / content

Security automation content in SCAP, Bash, Ansible, and other formats
https://complianceascode.readthedocs.io/en/latest/
Other
2.16k stars 685 forks source link

Remediating CIS via Ansible on RHEL-10 leads to broken D-BUS #12191

Open comps opened 1 month ago

comps commented 1 month ago

Description of problem:

Unfortunately, I don't have a solution, so the following is just a series of notes from my incomplete investigation.

Remediating a pre-built (by CaC/content build system) CIS playbook (any cis* profile) on RHEL-10 leads to Ansible failing to configure fapolicyd

...
TASK [Configure Firewalld to Restrict Loopback Traffic - Ensure firewalld Package is Installed] ***
ok: [localhost] => (item=firewalld) => {"ansible_loop_var": "item", "changed": false, "item": "firewalld", "msg": "Nothing to do", "rc": 0, "results": []}

TASK [Configure Firewalld to Restrict Loopback Traffic - Collect Facts About System Services] ***
ok: [localhost] => (Redacted by Contest)

TASK [Configure Firewalld to Restrict Loopback Traffic - Ensure firewalld trusted Zone Restricts IPv4 Loopback Traffic] ***
skipping: [localhost] => {"changed": false, "false_condition": "ansible_facts.services['firewalld.service'].state == 'running'", "skip_reason": "Conditional result was False"}

TASK [Configure Firewalld to Restrict Loopback Traffic - Ensure firewalld trusted Zone Restricts IPv6 Loopback Traffic] ***
skipping: [localhost] => {"changed": false, "false_condition": "ansible_facts.services['firewalld.service'].state == 'running'", "skip_reason": "Conditional result was False"}

TASK [Configure Firewalld to Restrict Loopback Traffic - Ensure firewalld Changes are Applied] ***
skipping: [localhost] => {"changed": false, "false_condition": "ansible_facts.services['firewalld.service'].state == 'running'", "skip_reason": "Conditional result was False"}

TASK [Configure Firewalld to Restrict Loopback Traffic - Informative Message Based on Service State] ***
fatal: [localhost]: FAILED! => {
2024-07-19 13:55:31 test.py:30: lib.waive.collect_waivers:141: using /var/tmp/runcontest-results/task1/plans/default/discover/default-0/tests/conf/waivers for waiving
2024-07-19 13:55:31 test.py:30: lib.results.report_plain:182: ERROR playbook: Configure Firewalld to Restrict Loopback Traffic - Informative Message Based on Service State ({)
    "assertion": "ansible_facts.services['firewalld.service'].state == 'running'",
    "changed": false,
    "evaluated_to": false,
    "msg": [
        "firewalld service is not active. Remediation aborted!",
        "This remediation could not be applied because it depends on firewalld service running.",
        "The service is not started by this remediation in order to prevent connection issues."
    ]

This is possibly because it's not running - the remediation was run on a Beaker (internal) system where firewalld is disabled by default, but earlier playbook tasks should have enabled it (as they do on RHEL-8/9).

I tried unselecting

to progress further, which stopped on

TASK [NetworkManager Deactivate Wireless Network Interfaces] *******************
fatal: [localhost]: FAILED! => {"changed": true, "cmd": ["nmcli", "radio", "wifi", "off"], "delta": "0:00:00.006275", "end": "2024-07-18 20:33:46.473674", "msg": "non-zero return code", "rc": 8, "start": "2024-07-18 20:33:46.467399", "stderr": "Error: NetworkManager is not running.", "stderr_lines": ["Error: NetworkManager is not running."], "stdout": "", "stdout_lines": []}

That gave me a clue, and investigating further on the OS revealed that, indeed, networking is down after a remediation + reboot, due to NM failing to start due to an unfulfilled systemd dependency - dbus.service. It turns out dbus-broker.service failed to start on

dbus-broker-launch[625]: ERROR launcher_run_child @ ../src/launch/launcher.c +326: Permission denied
dbus-broker-launch[624]: ERROR service_add @ ../src/launch/service.c +1011: Transport endpoint is not connected
dbus-broker-launch[624]:       launcher_add_services @ ../src/launch/launcher.c +805
dbus-broker-launch[624]:       launcher_run @ ../src/launch/launcher.c +1416
dbus-broker-launch[624]:       run @ ../src/launch/main.c +152
dbus-broker-launch[624]:       main @ ../src/launch/main.c +178
dbus-broker-launch[624]: Exiting due to fatal error: -107

I asked the systemd people, but got no reply back so far, so I started digging into the source code - I downloaded and rpmbuild-patched the same version of systemd, and discovered that src/launch/launcher.c line 326 is the error_origin() in

        r = sd_id128_get_machine(&machine_id);
        if (r < 0) {
                r = error_origin(r);
                goto exit;
        }

which means that sd_id128_get_machine() failed on EPERM. I then looked into systemd source itself to see what the function does, and it basically just reads /etc/machine-id:

_public_ int sd_id128_get_machine(sd_id128_t *ret) {
        static thread_local sd_id128_t saved_machine_id = {};
        int r;

        if (sd_id128_is_null(saved_machine_id)) {
                r = id128_read("/etc/machine-id", ID128_FORMAT_PLAIN | ID128_REFUSE_NULL, &saved_machine_id);
                if (r < 0)
                        return r;
        }

        if (ret)
                *ret = saved_machine_id;
        return 0;
}

but that makes no sense - all it does is to, in essence, read the world-readable file:

$ ls -l /etc/machine-id 
-r--r--r--. 1 root root 33 Jul 18 19:21 /etc/machine-id
$ cat /etc/machine-id 
ccbf9a653ac84fe6bc0d6b40a0a49167
$ ausearch -m avc | grep machine-id
$ 

Disabling SELinux also didn't fix it, /etc has the usual 0755, so there should be no issues accessing that file. No file ACLs either.

I also looked into src/launch/service.c line 1011 (given that it was mentioned) and there isn't anything conclusive either:

        r = sd_bus_call_method(launcher->bus_controller,
                               NULL,
                               "/org/bus1/DBus/Broker",
                               "org.bus1.DBus.Broker",
                               "AddName",
                               NULL,
                               NULL,
                               "osu",
                               object_path,
                               service->name,
                               service->data->uid);
        if (r < 0)
                return error_origin(r);

Googling around, it seems that Transport endpoint is not connected (ENOTCONN) is a frequent return from the function when something below goes wrong.

At that point, I gave up.

Independently, I tried remediating the same playbook after setenforce 0, and it at least went through all rules (didn't stop on firewalld or NM), but the networking was dead anyway, so it did not fix the issue.

SCAP Security Guide Version:

703fb11c94d60366da108e02f8bf21b7fae87a81

Operating System Version:

RHEL-10

comps commented 1 month ago

For the record - I did review the rules remediated by Ansible up to the point of it stopping for firewalld, but there was nothing obviously responsible - just some account password policy setting, /etc/login.defs, etc., no sysctls or anything.