Allow Remarks alongside profile entries and other Profile-related changes

Description of problem:

Thanks to @mab879's work on https://github.com/OpenSCAP/openscap/pull/1744, I was thinking it might be nice to expose this functionality in CaC. As a parallel to https://github.com/OpenSCAP/openscap/issues/1743 consider this:

When building profiles, we might want to give a reason (to the end-consumer of the CaC/content artifacts) why certain rules are either selected or not selected by default in this profile and how this selection impacts them. <xccdf:remark /> provides this capability.

I could see one of two design paths for this:

Modify the format of the profile YAML file. Either turn selection into an array of (optionally) a structure like:
```
 selection:
  - my_rule
  - rule: my_other_rule
    remark: selected because we need it 
```
The downside is this approach makes reading and parsing the profile harder. I have some tooling which assumes a certain structure of the YAML (grep/sed/awk) that would break. Small matter I guess :-) Maybe the SSG test suite would also break? I do not know!
Alternatively, stick this data somewhere else but close by. Either within the same .profile in a different section (remarks maybe, mapping rule_id into remark text? Or in a adjacent file (cis_level1_server.remarks). Either way could probably work.

I think the former would be the cleanest design, but maybe this breaks other assumptions in other places?

So I'd personally lean towards either of the designs in path 2 above.

Thoughts?

uh! this sounds awesome! We've been missing this from profiles and tailored profiles too.

Either approach looks fine to me, but slightly leaning towards approach 1.

The first approach keeps the remarks close to the rule selection, keeping spacial locality of data tighter. I think this helps maintenance of consistency. For example, when removing a rule it is easier to note the remark and remove it too.

I think the former would be the cleanest design, but maybe this breaks other assumptions in other places?

I don't think the change in the selection would affect the Test Suite. It will impact the compiled/resolved profiles, which are used by the profile stability tests. But it is probably just a matter of ignoring the remarks during the test.

The second approach makes it easy to share and add remarks to already existing profiles, but keeps it a separate "thing". In my opinion, this makes the remarks easier to forget and prone to being outdated.

Either approach looks fine to me, but slightly leaning towards approach 1.

Would you prefer the "new" format strictly enforced? Something like:

selection:
 - rule: my_rule
 - rule: some_other_rule
   state: disabled
 - rule: my_other_rule
   remark: selected because we need it 
 - var: var_other_var
   value: 3

Or would we still allow the "shorthand" and only break out the explicit keys on rules with remarks?

Edit: Additionally, remarks can take a lang attribute and there could be multiple remarks. Should we do:

 - shorthand_rule
 - rule: my_rule
   remark: My remark
 - rule: my_other_rule
   selected: false
   remarks:
     - One remark with American colors. 
     - lang: en-GB
       text: This remark has British colour!

That gets awfully verbose awfully quickly but it does show structure... Thoughts?

So having now seen the controls format, I'd like to propose augmenting it with an improved profile structure.

In particular:

A profile's selections key is a list.

Entries in the list are either strings or structs. Strings follow the existing shorthand (= for a variable, ! for a negated selection, or a plain identifier).

There are a couple of types of structs:

A group struct. It looks like:
```
group: group_id # required
selected: <bool> # optional, default true
```
Note that it can't have a remark child as IIRC selection works on a rule_id only and not an entire group. I might be wrong, if so, can add remark as a key here. (Edit: I was under the impression the existing shorthand profile already allowed selecting a group by id; this was all nested rule.yml in a folder with a group.yml. Same existing definition. Maybe it doesn't, but I've been wanting to use it a few times in some benchmarks so I think it'd be useful even if the current build system doesn't support it yet.)

A rule struct. It looks like:

rule: rule_id # required
selected: <bool> # optional, default true
remark: <text> # optional, default empty/None

A variable struct. It looks like:

var: variable_id # required
value: <value> # optional, default is default.
remark: <text> # optional

A section struct.
```
section: <text> # required; title of the section of the benchmark.
references:        # optional
    <ref>: <value> # required if referencers present. e.g,. cis@2004: 1.1.1.1
selections: <list> # required
```
Here a selections list is the same as the top-level (in the profile). references can then be interpreted by utilities as semantic meaning and all sub-rules within that section can then have that reference be automatically added.
A control struct.
```
control: identifier # required
selection: all # ??? -- not documented -- as existing, optional?
refinement: high # ??? - not documented -- as existing, optional? 
reference: yamlpath? # ??? - a way to select only stuff in a nested value somehow?
```
This mimics the existing control shorthand. selection/refinement would mirror the existing content. I'd prefer not to add it though, see below about checking into the source tree. If desired, auto-generated profiles could be imported via the profile statement instead.

~~A glob struct.~~ -- Probably too much

glob: <system> # required, system to glob against. e.g., CIS, CCE, &c.
value: <pattern> # required, value to match against
type: {exact,substr,regex} # whether to match using exact, substring, or regex matching.
selected: <bool> # optional, default true
~~This allows bulk-selection of relevant rules. It should only match when prodtype allows it. section's reference updating would not apply.~~

A profile struct.
```
profile: <other-profile>
```
This allows to create smaller profiles (e.g., CIS section 1) and combine them into a single section. Not sure how to handle conflicting variable values. Probably would need to be added explicitly in this profile.

This:

Retains backwards compatibility.
Retains product-specificness.
Adds semantic intent into the profile.
Retains ability to use control files when appropriate.
Allows relatively automated profile creation off of existing benchmarks.
When control files aren't appropriate, provides enough structure to create STIG & CIS per-product.

(If we remove control as a profile-level field): Additionally, I think we could consider checking-in profiles generated using control files. We could update the generator to retain semantic intent from the control file (placing it into the section struct) and users could use meld or similar utility to merge changes in from an auto-generated profile into a well-maintained product profile.

Thoughts?

@cipherboy

Here are some random thoughts.

I like that the proposal is backwards compatible.

I also like that it retains ability to use controls files. However, I feel that it can overlap with control files purpose.

I think the section structure is the same concept as the "control" in the controls files. Both represent a section in a benchmark. You can have the section ID or requirement ID there and map it to SCAP rules and add some metadata to it.

I like the remark idea. What is the intended use? I think it can contain information like: "We have chosen this rule because the policy requirement 5.7.6 says that we need integrity checking and aide is the best tool that we have for that in RHEL". Is that correct? I think we don't have this at the moment. But I also think that the control files can be extended with that.

This allows bulk-selection of relevant rules. It should only match when prodtype allows it. section's reference updating would not apply.

I think that having multiple ways of bulk selection (controls, globs, extensions) would make it more confusing.

Allows relatively automated profile creation off of existing benchmarks.

I think we can achieve relatively automated profile creation off of existing benchmarks by using control files as well. Easier creation of profiles from existing benchmarks was the reason why we invented the control files.

When control files aren't appropriate, provides enough structure to create STIG & CIS per-product.

The control files can work for per-product benchmarks (STIG and CIS) as well. You can have a file controls/cis_rhel8.yml and let the build system to generated the profile yaml file at the build time. I admit it would feel weird. But you would still maintain a single file which would be the controls file and the profile file would contain just a single line.

(If we remove control as a profile-level field): Additionally, I think we could consider checking-in profiles generated using control files. We could update the generator to retain semantic intent from the control file (placing it into the section struct) and users could use meld or similar utility to merge changes in from an auto-generated profile into a well-maintained product profile.

Would this "checking-in" happen at the build time? Would it happen automatically? Or would it be run manually and then results will be commited as the "profile" yaml file?

One of the benefits of the "control" files is that profiles are generated automatically. We have a single YAML file https://github.com/ComplianceAsCode/content/blob/master/controls/anssi.yml and from that we automatically generate 20 profiles at the build time. There are 4 anssi profiles in OL7, OL8, RHEL7, RHEL8, RHCOS4 all generated from a single source. It makes it easier to maintain ANSSI. If something changes in the requirements we don't have to edit and sync multiple files. And it makes it easier to add ANSSI for other products in the future.

Overall it's a great proposal. I'm thinking about how to solve the overlap with controls files that I see here.

@matejak @yuumasato Other thoughts?

Thanks for the thoughts, @jan-cerny!

And I would just like to mention, reading more about control files, I do like the idea (thanks @matejak!).

I think I mostly don't like the idea of auto-generating them on build instead of preserving them as a system artifact.

@jan-cerny said:

I think the section structure is the same concept as the "control" in the controls files. Both represent a section in a benchmark. You can have the section ID or requirement ID there and map it to SCAP rules and add some metadata to it.

That was the intent ;-)

Two reasons:

Profiles are already the product-specific control file. :-) They are what a control file outputs after all. They just lack some of the nice features that control files added.
If we switch to checking-in the results of building a control file, we can retain semantic section information using this mechanism.

Additionally, we can create one format (profiles -- whether built from a control file or manually), parse it, and add tooling to automatically update the relevant rules. This adds the semantic meaning we need for that.

@jan-cerny said:

I like the remark idea. What is the intended use? I think it can contain information like: "We have chosen this rule because the policy requirement 5.7.6 says that we need integrity checking and aide is the best tool that we have for that in RHEL". Is that correct? I think we don't have this at the moment. But I also think that the control files can be extended with that.

I've gotten a lot of questions about why x rule was selected or in what scenarios it applies in from our existing internal content.

Things like "does this work on $cloud"? Can we disable it but give the customer a reason why? &c.

If the tooling (end-to-end) supports remarks, I think it'd be a great feature.

@jan-cerny said:

@cipherboy said:

This allows bulk-selection of relevant rules. It should only match when prodtype allows it. section's reference updating would not apply. I think that having multiple ways of bulk selection (controls, globs, extensions) would make it more confusing.

I probably wouldn't implement glob on my first pass :) Just thought it might be an interesting feature.

@jan-cerny said:

@cipherboy said:

Allows relatively automated profile creation off of existing benchmarks. I think we can achieve relatively automated profile creation off of existing benchmarks by using control files as well. Easier creation of profiles from existing benchmarks was the reason why we invented the control files.

Ah sorry, I didn't expand well on this. There's two parts of automation:

Creation of the list of rules in a profile. Control helps to automate this for the product-independent case.
Automatically updating rules in a profile over time. New prodtypes, maintaining references &c. Control as a source format makes sense, but we either need to duplicate all our tooling (to take either a profile OR a control) -- or we treat profile as a "source" format for tooling and make control compile (smartly) to it.

I agree, for product-independent benchmarks, control does help a lot. However, there's still a lot of product-specificness in our core benchmarks (STIG and CIS).

@jan-cerny said:

@cipherboy said:

When control files aren't appropriate, provides enough structure to create STIG & CIS per-product.

The control files can work for per-product benchmarks (STIG and CIS) as well. You can have a file controls/cis_rhel8.yml and let the build system to generated the profile yaml file at the build time. I admit it would feel weird. But you would still maintain a single file which would be the controls file and the profile file would contain just a single line.

...but... why? To me, it almost reads like we'd be better off extending the profile format (to take in the nice features that control has added) and then allow product-independent profiles (putting them in linux_os/profiles maybe?) and updating the build system appropriately.

@jan-cerny said:

@cipherboy said:

(If we remove control as a profile-level field): Additionally, I think we could consider checking-in profiles generated using control files. We could update the generator to retain semantic intent from the control file (placing it into the section struct) and users could use meld or similar utility to merge changes in from an auto-generated profile into a well-maintained product profile.

Would this "checking-in" happen at the build time? Would it happen automatically? Or would it be run manually and then results will be commited as the "profile" yaml file?

Control has a great use in the non-specialized benchmarks. There's still a concern, with build-time generation, that incorrect guidance is added to other products. Think, Red Hat updates ANSSI benchmark and (while prodtype does allow it and is useful in some scenarios) SLES or Ubuntu's profile changes as a result --- even though we don't necessarily want that piece of content in that benchmark. We'd have to be watching the anssi control changes and going back and manually disabling rules not relevant in a curated compliance scenario.

Think also about RHEL: if you add a new rule to the ANSSI benchmark but have a bug in the existing content, rebasing to a new version could affect all products with that benchmark. If the control was instead a manual step (to compile it into a checked-in profile) -- you could have more confidence that the new release/... matches the existing profile +/- any intentional additions.

My 2c. but I'd like to see a way to manually curate the profile created from a control group.

Another example would be firewalld -- it is already possible to use on a Ubuntu host but it isn't the officially supported approach. If we or a community member adds prodtype: ubuntu2004 and we rebuild anssi, suddenly it could get picked up even though we wouldn't necessarily want it. Same goes for things like iptables (which already bears a RHEL identifier) but is the preferred solution on Ubuntu. If we add it to the anssi control, it would break your profiles. ;-)

The auto-generated nature of control is great. But not having review on the output and a way of comparing the changes for each product to make sure they're intentional isn't.

I'd propose:

Manual execution of control->profile updating.
For products which are OK with mostly-automated stuff, adding an override for build-time controls OR:
OR adding a test case to check for out of date control->profile templating.

So to concretely answer your question, yes to the last one. :)

@jan-cerny said:

It makes it easier to maintain ANSSI. If something changes in the requirements we don't have to edit and sync multiple files. And it makes it easier to add ANSSI for other products in the future.

I whole heartedly agree that we wouldn't need manual syncing to these files, but we should be cognizant of the changes and the other rules in our content system. Perhaps all of those products agree to fully automated updating. They're all very similar ;-) But if we add this for Ubuntu, we run the risk of picking up changes that are OK in some scenarios but not great all together.

And at review time of the ANSSI change, (e.g. Oracle) reviewers can concretely see what this ANSSI change meant for the specified product.

With a mapping of product, profile, and control (or another field in the profile (automated: true), I dunno) --- a single script execution could update all profiles.

My 2c but I think you gain a lot by not automating control->profile creation at build time but instead allow for manual review.

With profile import feature above, you could have the automated profile in a separate file from any manual overrides (like it is today) and automated building wouldn't impact the manually curated file.

Hope this helps :-)

Thank you very much for the detailed explanation. It helped me a lot to understand.

Hello, These are lot of changes, I am somehow consuming them gradually. I have a remark to introduction of "group" struct. Currently we are handling groups based on folder structure. Each folder is a group and rules are assigned to groups by being put into a folder. Moreover, we are using this mechanism to inherit applicability platforms (CPEs). So I would be careful with introduction of groups, cause it would mean that groups them selves would need to be defined somewhere as well, together with descriptions and applicability. The applicability is an area which should see some updates soon. I must say that I like the idea of remarks along rules, I see it as a benefit for content authors.

Hey @vojtapolasek!

Just wanted to say a group above meant a group in the existing sense. :-) Just another way of grabbing all (applicable, nested) rules in that group. No changing definition. Sorry if that wasn't clear, I'll update the text above.

(I was under the impression the existing shorthand already did that, but maybe not!)

ComplianceAsCode / content

Allow Remarks alongside profile entries and other Profile-related changes #6889

Description of problem: