service_nftables_disabled fails after remediation

marcusburghardt commented 1 year ago

Description of problem:

This rule was introduced by https://github.com/ComplianceAsCode/content/pull/10390. It is failing after remediation when checking CIS Server Level 2 profile.

SCAP Security Guide Version:

master branch as of 2023-04-01

Operating System Version:

RHEL8.8 and RHEL9.2

Steps to Reproduce:

./build_product rhel8
oscap xccdf eval --progress --remediate --profile xccdf_org.ssgproject.content_profile_cis --report /cis_remediate_report.html /ssg-rhel8-ds.xml
oscap xccdf eval --progress --profile xccdf_org.ssgproject.content_profile_cis --results cis-xccdf-results.xml --report cis.html /ssg-rhel8-ds.xml

Actual Results:

xccdf_org.ssgproject.content_rule_service_nftables_disabled - fail

Expected Results:

xccdf_org.ssgproject.content_rule_service_nftables_disabled - pass

Additional Information/Debugging Steps:

jan-cerny commented 1 year ago

It passes using AutoMatus:

[jcerny@thinkpad scap-security-guide{master}]$ python3 tests/automatus.py rule --libvirt qemu:///system ssgts_rhel8 service_nftables_disabled
Setting console output to log level INFO
INFO - The base image option has not been specified, choosing libvirt-based test environment.
INFO - Logging into /home/jcerny/work/git/scap-security-guide/logs/rule-custom-2023-04-05-1025/test_suite.log
INFO - xccdf_org.ssgproject.content_rule_service_nftables_disabled
INFO - Script service_disabled.pass.sh using profile (all) OK
INFO - Script service_enabled.fail.sh using profile (all) OK

So I will take a look what will happen in the context of the whole profile.

jan-cerny commented 1 year ago

I haven't reproduced this in a RHEL 8.8 VM. The rule is pass after the remediation and the nftables service is masked after the remediation. The rule is templated so it should work the same way as all other service_disabled rules. @marcusburghardt I suspect that there can be problem with the service itself, maybe something that prevents the service from being disabled?

marcusburghardt commented 1 year ago

I haven't reproduced this in a RHEL 8.8 VM. The rule is pass after the remediation and the nftables service is masked after the remediation. The rule is templated so it should work the same way as all other service_disabled rules. @marcusburghardt I suspect that there can be problem with the service itself, maybe something that prevents the service from being disabled?

Same here @jan-cerny . Yesterday I also executed few profile tests locally and couldn't reproduce it. I also tested a systems with the nftables service disabled during the rule implementation and it worked fine. I am investigating the service to see if any details is missed.

jan-cerny commented 1 year ago

I suspect that the state might be changed during the reboot, I'm trying it

jan-cerny commented 1 year ago

In this productization test there is a reboot and a scan is performed both before and after reboot. Before the reboot the service is masked but after the reboot the service is unmasked. I have tried to reproduce it locally on a virtual machine and on various remote machines, but I wasn't able to reproduce it outside the specific environment that is used during the productization, which prevents me from debugging it. I suspect that it can be something specific to the infrastructure.

For the time being let's keep the issue opened and observe the result of the next week productization test. Then, we might solve it by adding a (temporary) waiver.

Mab879 commented 1 year ago

This is still an issue in the latest run.

jan-cerny commented 1 year ago

I'm going to revisit this issue this week.

mildas commented 1 year ago

Sanity/machine-hardening test is one of those where it fails. If you want, I can fairly quickly get you a machine where the test was run and the rule fails.

The issue doesn't seem to be environment problem. It fails also in kickstart test. Kickstart test installs VM, hardens it via Anaconda addon, and performs VM scan after its first boot. No beakerlib, no workarounds (except few rule unselects so it's accessible), basically freshly installed RHEL.

jan-cerny commented 1 year ago

I wasn't able to reproduce this. You can ping me off-list to get some details about my machines. I'm honestly giving up.

marcusburghardt commented 1 year ago

Sanity/machine-hardening test is one of those where it fails. If you want, I can fairly quickly get you a machine where the test was run and the rule fails.

The issue doesn't seem to be environment problem. It fails also in kickstart test. Kickstart test installs VM, hardens it via Anaconda addon, and performs VM scan after its first boot. No beakerlib, no workarounds (except few rule unselects so it's accessible), basically freshly installed RHEL.

It would be great if you can provide me access for this machine Milan.

marcusburghardt commented 1 year ago

I wasn't able to reproduce this. You can ping me off-list to get some details about my machines. I'm honestly giving up.

Thanks for the efforts @jan-cerny . You provided value information with your tests. I will try to continue the investigation in this issue.

yuumasato commented 1 year ago

This still happens as of this week.

mildas commented 1 year ago

@marcusburghardt On reserved machine service_nftables_disabled passes by default. There, I did service_nftables_disabled check after every CIS Level 2 rule remediation. And it starts failing right after service_firewalld_enabled. So that's the collision.

I haven't done any further investigation, if there's service dependency, if there's more problematic rules in CIS (and firewalld was just the first hit) or what's going on. I might look at it later, but wanted to inform you as you might already know what's going on.

marcusburghardt commented 1 year ago

@marcusburghardt On reserved machine service_nftables_disabled passes by default. There, I did service_nftables_disabled check after every CIS Level 2 rule remediation. And it starts failing right after service_firewalld_enabled. So that's the collision.

I haven't done any further investigation, if there's service dependency, if there's more problematic rules in CIS (and firewalld was just the first hit) or what's going on. I might look at it later, but wanted to inform you as you might already know what's going on.

@mildas I didn't find any relationship between service_nftables_disabled and service_firewalld_enabled. I also checked the template and the remediation. Everything seems to be ok.

Something weird is that the first scan should not pass. It seems by any reason the first scan is unable to properly assess the systemd units: Screenshot from 2023-05-15 17-35-33

So, the remediation is not applied. However, after reboot, the systemd units are properly assessed: Screenshot from 2023-05-15 17-37-12

This second scan is correct and it is failing because the nftables.service unit is not masked. It would only be masked if the remediation would be applied. But since the initial scan is not properly discovering the nftables.service state, it is reporting a false positive.

Do you have any idea on why the nftables.service state is not detected during the first scan?

mildas commented 1 year ago

Do you have any idea on why the nftables.service state is not detected during the first scan?

I haven't found anything obvious, services doesn't reveal anything, and oscap devel log didn't help me either why it doesn't see it on first scan.

@jan-cerny Could you look into it? Or should we report it against openscap project? See last 2 message, but basically openscap doesn't see that nftables service is not masked until firewalld_enabled gets remediated.

jan-cerny commented 1 year ago

I can reproduce the situation that @marcusburghardt described, ie. the situation that oscap can't read the state of the nftables service.

I have found that reason is that oscap doesn't get the data about the nftables.service systemd unit from dbus.

However, even systemd doesn't show this unit.

This gives no output:

systemctl list-units --all | grep nftables

Also, this doesn't give any output:

 dbus-send --system --print-reply --reply-timeout=2000 --type=method_call --dest=org.freedesktop.systemd1 /org/freedesktop/systemd1 org.freedesktop.systemd1.Manager.ListUnits | grep nftables

OTOH the status can be displayed

[root@kvm-05-guest10 build]# systemctl status nftables
● nftables.service - Netfilter Tables
   Loaded: loaded (/usr/lib/systemd/system/nftables.service; disabled; vendor preset: disabled)
   Active: inactive (dead)
     Docs: man:nft(8)

I guess that this is some specific behavior of systemd/dbus that I don't understand.

marcusburghardt commented 1 year ago

I can reproduce the situation that @marcusburghardt described, ie. the situation that oscap can't read the state of the nftables service.

I have found that reason is that oscap doesn't get the data about the nftables.service systemd unit from dbus.

However, even systemd doesn't show this unit.

This gives no output:
systemctl list-units --all | grep nftables
Also, this doesn't give any output:
 dbus-send --system --print-reply --reply-timeout=2000 --type=method_call --dest=org.freedesktop.systemd1 /org/freedesktop/systemd1 org.freedesktop.systemd1.Manager.ListUnits | grep nftables
OTOH the status can be displayed
[root@kvm-05-guest10 build]# systemctl status nftables
● nftables.service - Netfilter Tables
   Loaded: loaded (/usr/lib/systemd/system/nftables.service; disabled; vendor preset: disabled)
   Active: inactive (dead)
     Docs: man:nft(8)
I guess that this is some specific behavior of systemd/dbus that I don't understand.

@jan-cerny , in your test environment, could you try to execute the systemctl daemon-reload command before the systemctl list-units --all | grep nftables, please? After rebooting the system it seems to work, so probably this command should help, but we need to confirm if it is the case.

jan-cerny commented 1 year ago

Thanks! I will try it.

jan-cerny commented 1 year ago

The result is that systemctl daemon-reload doesn't change anything. After executing it, the systemctl list-units --all | grep nftables still returns nothing. Also the output of other commands is still the same as in my previous comment. I tried multiple times.

marcusburghardt commented 1 year ago

The result is that systemctl daemon-reload doesn't change anything. After executing it, the systemctl list-units --all | grep nftables still returns nothing. Also the output of other commands is still the same as in my previous comment. I tried multiple times.

Ok. Thank you very much for this test @jan-cerny . I was considering we might have an option to workaround this issue without a reboot, but it doesn't seem to be the case.

marcusburghardt commented 1 year ago

@mildas and @jan-cerny , would you agree to move this issue to the scanner and waive this rule on content side?

jan-cerny commented 1 year ago

@marcusburghardt This doesn't seem to be an issue on the content side. However, I'm not sure if it's an issue in the scanner. I don't know where the issue exactly is. In OpenSCAP, we only perform a dbus call of the org.freedesktop.systemd1.Manager.ListUnits method and parse the returned value. I have shown in the comment above that in this situation this dbus call doesn't return any data about the nftables unit. So, the first option is that we call a wrong method in OpenSCAP. The second option is a problem with systemd or the nftables itself.

marcusburghardt commented 1 year ago

@marcusburghardt This doesn't seem to be an issue on the content side. However, I'm not sure if it's an issue in the scanner. I don't know where the issue exactly is. In OpenSCAP, we only perform a dbus call of the org.freedesktop.systemd1.Manager.ListUnits method and parse the returned value. I have shown in the comment above that in this situation this dbus call doesn't return any data about the nftables unit. So, the first option is that we call a wrong method in OpenSCAP. The second option is a problem with systemd or the nftables itself.

I see. It makes sense. So we need more investigation on this to make it clear the source of the problem. It sounds reasonable to keep this issue opened here. It is also clear to us the rule itself is working as expected, so we should be fine to waive this issue in productization tests while this issue is open. Is it ok for you @mildas ?

mildas commented 1 year ago

Sure, I will update waivers. I suggest contacting someone from systemd if the could briefly check it. That could save us time. There's still some time before next RHEL release, so if we act quickly, it could get fixed there. We just need to know whom to assign bugzilla and create it.

vojtapolasek commented 1 year ago

I did my own investigation and I suspect that our problem is caused by Systemd handling unit files in a way which we do not expect. I suspect that in case there exists an unit file which has not been started manually / automatically nor has been mentioned in some other unit file which has been started manually / automatically, it does not appear in the list of units. Here is a reproducer:

cd /usr/lib/systemd/system
cp nftables.service myservice.service
systemctl daemon-reload
dbus-send --system --print-reply --reply-timeout=2000 --type=method_call --dest=org.freedesktop.systemd1 /org/freedesktop/systemd1 org.freedesktop.systemd1.Manager.ListUnits | grep myservice -> no output
systemctl daemon-reload
dbus-send --system --print-reply --reply-timeout=2000 --type=method_call --dest=org.freedesktop.systemd1 /org/freedesktop/systemd1 org.freedesktop.systemd1.Manager.ListUnits | grep myservice -> no output
systemctl start firewalld.service
edit firewalld.service and append "myservice.service" to the "conflicts" line
dbus-send --system --print-reply --reply-timeout=2000 --type=method_call --dest=org.freedesktop.systemd1 /org/freedesktop/systemd1 org.freedesktop.systemd1.Manager.ListUnits | grep myservice -> no output
systemctl daemon-reload
dbus-send --system --print-reply --reply-timeout=2000 --type=method_call --dest=org.freedesktop.systemd1 /org/freedesktop/systemd1 org.freedesktop.systemd1.Manager.ListUnits | grep myservice -> object path "/org/freedesktop/systemd1/unit/myservice_2eservice"

evgenyz commented 1 year ago

The nftables.service is:

[Unit]
Description=Netfilter Tables
Documentation=man:nft(8)
Wants=network-pre.target
Before=network-pre.target

[Service]
Type=oneshot
ProtectSystem=full
ProtectHome=true
ExecStart=/sbin/nft -f /etc/sysconfig/nftables.conf
ExecReload=/sbin/nft 'flush ruleset; include "/etc/sysconfig/nftables.conf";'
ExecStop=/sbin/nft flush ruleset
RemainAfterExit=yes

[Install]
WantedBy=multi-user.target

The most important part of it is Type=oneshot: https://trstringer.com/simple-vs-oneshot-systemd-service/. This might be the reason it is not listed.

evgenyz commented 1 year ago

When I call the method via D-Spy I get ('nftables.service', 'Netfilter Tables', 'loaded', 'inactive', 'dead', '', '/org/freedesktop/systemd1/unit/nftables_2eservice', 0, '', '/'),.

marcusburghardt commented 1 year ago

This issue is not content related, but something related to D-BUS and systemd. @evgenyz, would you like to investigate if we can fix this on the scanner side?

mildas commented 1 year ago

Could you create BZ either to openscap or dbus and close this issue? @evgenyz or @marcusburghardt

ggbecker commented 10 months ago

We need to retest this using the openscap that contains the fix https://github.com/OpenSCAP/openscap/pull/1980

ComplianceAsCode / content