add util/backup.py for backup+restore of filesystem paths
This was for an earlier version of the test, which actually remediated the audit rules into /etc/audit - I first had a VM-using test, but realized we need to run it multi-arch, so I rewrote it without VMs, hence the backup/restore.
This was likely a RHEL-7 limitation, plus we didn't want to add
unnecessary dependency on non-python-stdlib modules (like 'requests'
or 'PyYAML').
However we keep running into the need to parse YAML (ie. playbooks
or TMT metadata), and python3*-pyyaml seems to be widely available
in RHEL (even ie. python39-pyyaml on RHEL-8), unlike 'rpm' or 'dnf',
so we might as well use it.
The '/static-checks/ansible/allowed-modules' test already uses it
without adding a RPM dependency, which hints at how widely available
PyYAML is. Also, TMT probably depends on it too.
I specifically needed it in the audit test to parse ansible remediations and get .rules filepath / contents without having to sed/grep bash code.
build CONTEST_CONTENT in a TMT plan prepare step
Per the commit:
This is because building takes a long time (10+ minutes) on some
slow systems and that easily exceeds a 5/10 minute 'duration'
of a small test that never anticipated the extra delay.
Moving it to a plan moves it outside the 'duration' count.
Also, it's probably a better central place given that all tests
always share the built content so the ./build_product options
have to be considered globally.
Finally, this is not a great fix given that we are still left
with the 'build' argument to get_content(),
def get_content(build=True):
...
and its SRPM-based building code, which still runs within test
'duration', however there is no easy solution for that (other than
getting rid of SRPM support entirely), so this commit at least
tries to bump things in a right direction.
split off xml-parsing code from class Datastream
I originally wanted to redesign the whole XML parsing in class Datastream to have a more modular "what you want" data retrieval:
class Datastream(enum.Flag):
"""
Represent an OpenSCAP datastream XML definition and enable extracting
selected values from it.
This is done by the class definition having enum.Flag style variables,
which get overwritten by a class instance to either None (if the given
value was not chosen to be extracted) or the contents of the extracted
data.
Ie.
ds = Datastream()
ds.parse_xml('file.xml', Datastream.profiles | Datastream.rules)
profiles = ds.profiles # is a dict() of profile metadata
rules = ds.rules # is a dict() of rule metadata
remediations = ds.remediations # is None (was not requested)
This is to preserve reasonable speed and low memory overhead for use
cases that don't need all the details.
Data structure example:
ds.path = Path(file_the_datastream_was_parsed_from)
ds.profiles = {
'ospp': namespace(
.title = 'Some text',
.rules = set( 'audit_delete_success' , 'service_firewalld_enabled' , ...),
.values = set( ('var_rekey_limit_size','1G') , ...),
),
... (all profiles in the datastream) ...
}
ds.rules = {
'configure_crypto_policy': namespace(
.fixes = FixType.bash | FixType.ansible,
),
'account_password_selinux_faillock_dir': namespace(
.fixes = FixType.bash,
),
... (all rules in the datastream) ...
}
ds.remediations = {
'configure_crypto_policy': {
FixType.bash: '... bash code ...',
FixType.ansible: '... ansible playbook ...',
},
'account_password_selinux_faillock_dir': namespace(
FixType.bash: '... bash code ...',
),
... (all rules with at least one type of fix available) ...
}
(FixType is Datastream.FixType)
"""
However, mid-way through implementing it, I realized that when parsing out remediation code, we probably don't want to extract all 10s of MBs of code, just a few select rules.
So I went with what's in the commit - splitting off the batched XML parser, allowing the audit test to parse the DS XML on its own, extracting remediation code only for rules in the policy_rules Group.
I'm not against revisiting the class Datastream code in the future, but (depite the big +++ and ---), nothing has really changed inside, aside from one minor bug fix (elem.get to elements[-1].get).
(These were originally part of a new
audit-sample-rules
test, https://github.com/RHSecurityCompliance/contest/issues/259, but I decided to split them off for easier discussion tracking.)Going by commits, from oldest:
add
util/backup.py
for backup+restore of filesystem pathsThis was for an earlier version of the test, which actually remediated the audit rules into
/etc/audit
- I first had a VM-using test, but realized we need to run it multi-arch, so I rewrote it without VMs, hence the backup/restore.The current version of the test doesn't use this, but I think it would still be good to have it, rather than throw the code away. The early versions of Contest also had it: https://github.com/RHSecurityCompliance/contest/blob/bb2778816c463f9148395c8e566c7724ce60b28e/lib/backup.sh :smile:
start using PyYAML instead of hacking YAML syntax
From the commit message body:
I specifically needed it in the audit test to parse ansible remediations and get
.rules
filepath / contents without having to sed/grep bash code.build
CONTEST_CONTENT
in a TMT plan prepare stepPer the commit:
split off xml-parsing code from
class Datastream
I originally wanted to redesign the whole XML parsing in
class Datastream
to have a more modular "what you want" data retrieval:However, mid-way through implementing it, I realized that when parsing out remediation code, we probably don't want to extract all 10s of MBs of code, just a few select rules.
So I went with what's in the commit - splitting off the batched XML parser, allowing the audit test to parse the DS XML on its own, extracting remediation code only for rules in the
policy_rules
Group.I'm not against revisiting the
class Datastream
code in the future, but (depite the big+++
and---
), nothing has really changed inside, aside from one minor bug fix (elem.get
toelements[-1].get
).