Festo-se / cyclonedx-editor-validator

Tool for creating, modifying and validating CycloneDX SBOMs.
https://festo-se.github.io/cyclonedx-editor-validator/
GNU General Public License v3.0
18 stars 4 forks source link

feat: Rework amend command, add optional operations #160

Closed mmarseu closed 3 months ago

mmarseu commented 4 months ago

Overview

This PR originally only intended to add a new operation to the amend command. This addition triggered others and snowballed into a rather large overhaul of the command. I'll try to go into all changes in adequate detail.

  1. New operation DeleteAmbigiousLicenses which deletes licenses identified only by name, with no other context that provides more insight about the license's content. This is not very contentious and requires little further explanation. See #144 for more.
  2. New command-line option to select operations to run.
  3. Breaking change: InferCopyright no longer runs by default.
  4. Breaking change: Broken up ProcessLicense
  5. Breaking change: InferSupplier no longer adds supplier information to components which already have comparable fields.
  6. Breaking change: Compositions now sets aggregate unknown instead of incomplete
  7. Improved documentation of design constraints in cdxev/amend/operations.py.
  8. Remove semi-integration tests from tests/test_amend.py

2. New command-line option to select operations to run

Because the added DeleteAmbigiousLicenses command can be dangerous, it shouldn't run by default. This necessitates a new mechanism for the user to select the operations to run.

This change introduces the --operations command-line option. It takes a list of operation names to run. If the option is not provided, a default set of operations is run.

3. Breaking change: InferCopyright no longer runs by default

The set of default operations includes all pre-existing operations with one exception: InferCopyright is extremely dangerous and its outright removal from the tool is currently in debate. Pending any decision, this PR makes it an opt-in operation.

4. Breaking change: Broken up ProcessLicense

4.1 Removed feature to delete licenses with name "unknown"

The ProcessLicense operation previously included a feature that deleted any licenses whose name includes the word "unknown" and that doesn't also specify the full text. This has been been removed for the following reasons:

If the feature is nevertheless desired, it can always be re-added as a separate operation. In that case, it shouldn't be added to the default set.

4.2 New operation LicenseNameToId

This operation runs by default and takes over the part of ProcessLicense that translates well-known names to SPDX ids.

4.3 New operation AddLicenseText

This operation takes over the part of ProcessLicense that adds full text to licenses with only a name.

It doesn't run by default because it needs a parameter to be specified. In the interest of usability, we shouldn't force users to provide a command-line option if they don't want to use the feature.

There are a few important changes:

5. Breaking change: InferSupplier is now more selective

In my effort to make the operation documentation fit for printing in the CLI, I discovered that InferSupplier was unnecessarily zealous. Given the inherent uncertainty in the information it generates (we can't really be sure that the generated supplier is valid), it probably should only operate on the components where it is really required. That's why it now no longer adds supplier information to components that already contain comparable fields (e.g., author, manufacturer, etc.).

6. Breaking change: Compositions now sets aggregate unknown instead of incomplete

Discussion: #161 Besides the aggregate, this PR also changes the behavior regarding the metadata component. Previously, this component was treated the same as any other. With this change, the metadata component keeps its original aggregate, it is not set to "unknown", unless it was "unknown" in the input.

7. Improved documentation of design constraints

The fact that individual operations are now exposed on the CLI for manual selection by the user leads to a few new design considerations for operations which have been documented in the module docstring in cdxev/amend/operations.py. The upshot is: keep it simple, keep it small, and be damn sure you don't run dangerous operations by default.

8. Remove semi-integration tests from tests/test_amend.py

Some tests have been removed in preparation of #157, where they will be grouped together in one module.

github-actions[bot] commented 4 months ago

Coverage

Coverage Report โ€ข
FileStmtsMissCoverMissing
__main__.py3101595%205–206, 223, 623–624, 628–633, 635, 638, 650–651
amend
   command.py340100% 
   license.py130100% 
   operations.py210597%386, 391, 429, 455, 523
TOTAL15836595% 

Tests Skipped Failures Errors Time
283 2 :zzz: 0 :x: 0 :fire: 4.252s :stopwatch:
mmarseu commented 4 months ago

@mmarseu I did some tests and seems like most of the commands are working as expected, so after doing the changes, we should be able to merge this one (as the comments are mostly regarding grammar/comments)

However, some more remarks:

  • Maybe you should rename this MR as it does more than described and is more a rework than just feat.

Fair enough.

  • I'm fine with "3. Breaking change: InferCopyright no longer runs by default", now it is up to anybody to decide whether they want to do this or not. Maybe a print to stdout with an INFO would be useful, like it is already done in class LicenseNameToId and AddLicenseText, would even add some consistency and then the changes are traceable ๐Ÿ˜‰.

Done

  • In "4. Breaking change: Broken up ProcessLicense" you mention that "It no longer adds an empty string as text to licenses where no text has been found. That intentionally makes the SBOM worse than it was before", however if I only have one ambiguous licenses in the licenses-array, then --operation delete-ambiguous-licenses leads to an empty array, i.e. licenses: []. Should not we, in this case, remove the key licenses completely, as an empty array does not add any (information) value?

We can. I'll add it.

Why the "one" is bold: It seems that --operation delete-ambiguous-licenses only deletes the first license with license.name, e.g.

"licenses": [
                {"license": {"id": "Apache-2.0"}},
                {"license": {"name": "Some license"}},
                {"licence": {"name": "blub"}}
            ],

leads to

"licenses": [
                {"license": {"id": "Apache-2.0"}},
                {"licence": {"name": "blub"}}
            ],

This is something you do not have included in your tests, there you only have the case one id and one name.

I've added a test for this. Weirdly, it works perfectly fine for me ๐Ÿ˜•

  • Do we need a test for the case that license.name is similar to an SPDX-ID and the delete-ambiguous-licenses operation is used? Just to make sure that it maps it to a license.id and does not delete it.

I don't think so. The user is free to make sure that they run license-name-to-id before delete-ambiguous-licenses. If they don't, they might run into this problem. In practice, this shouldn't be very frequent, because delete-ambiguous-licenses doesn't run by default. Most people outside Festo wouldn't ever use it. Internally, we need to make sure our pipeline templates invoke operations in the correct order.

  • The reason why we wanted supplier, which you now changed with "5. Breaking change: InferSupplier is now more selective", was the current tool landscape: Most of the tools (e.g. DependencyTrack and BlackDuck) do not understand e.g. publisher, mostly they either only supported supplier or, additionally, author.

I understand. I'll revert this particular change.

mmarseu commented 3 months ago

@mmarseu please resolve the merge conflicts and comment how to proceed with the year in NOTICE. But otherwise I do not have any further comments ๐Ÿ˜ƒ

I've added the year and opened #179 to discuss options for automation.