Closed mmarseu closed 3 months ago
Coverage Report โข
File Stmts Miss Cover Missing __main__.py 310 15 95% 205–206, 223, 623–624, 628–633, 635, 638, 650–651 amend command.py 34 0 100% license.py 13 0 100% operations.py 210 5 97% 386, 391, 429, 455, 523 TOTAL 1583 65 95%
Tests | Skipped | Failures | Errors | Time |
---|---|---|---|---|
283 | 2 :zzz: | 0 :x: | 0 :fire: | 4.252s :stopwatch: |
@mmarseu I did some tests and seems like most of the commands are working as expected, so after doing the changes, we should be able to merge this one (as the comments are mostly regarding grammar/comments)
However, some more remarks:
- Maybe you should rename this MR as it does more than described and is more a rework than just feat.
Fair enough.
- I'm fine with "3. Breaking change: InferCopyright no longer runs by default", now it is up to anybody to decide whether they want to do this or not. Maybe a print to stdout with an
INFO
would be useful, like it is already done inclass LicenseNameToId
andAddLicenseText
, would even add some consistency and then the changes are traceable ๐.
Done
- In "4. Breaking change: Broken up ProcessLicense" you mention that "It no longer adds an empty string as text to licenses where no text has been found. That intentionally makes the SBOM worse than it was before", however if I only have one ambiguous licenses in the
licenses
-array, then--operation delete-ambiguous-licenses
leads to an empty array, i.e.licenses: []
. Should not we, in this case, remove the keylicenses
completely, as an empty array does not add any (information) value?
We can. I'll add it.
Why the "one" is bold: It seems that
--operation delete-ambiguous-licenses
only deletes the first license withlicense.name
, e.g."licenses": [ {"license": {"id": "Apache-2.0"}}, {"license": {"name": "Some license"}}, {"licence": {"name": "blub"}} ],
leads to
"licenses": [ {"license": {"id": "Apache-2.0"}}, {"licence": {"name": "blub"}} ],
This is something you do not have included in your tests, there you only have the case one id and one name.
I've added a test for this. Weirdly, it works perfectly fine for me ๐
- Do we need a test for the case that
license.name
is similar to anSPDX-ID
and thedelete-ambiguous-licenses
operation is used? Just to make sure that it maps it to alicense.id
and does not delete it.
I don't think so. The user is free to make sure that they run license-name-to-id
before delete-ambiguous-licenses
. If they don't, they might run into this problem. In practice, this shouldn't be very frequent, because delete-ambiguous-licenses
doesn't run by default. Most people outside Festo wouldn't ever use it. Internally, we need to make sure our pipeline templates invoke operations in the correct order.
- The reason why we wanted
supplier
, which you now changed with "5. Breaking change: InferSupplier is now more selective", was the current tool landscape: Most of the tools (e.g. DependencyTrack and BlackDuck) do not understand e.g.publisher
, mostly they either only supportedsupplier
or, additionally,author
.
I understand. I'll revert this particular change.
@mmarseu please resolve the merge conflicts and comment how to proceed with the year in
NOTICE
. But otherwise I do not have any further comments ๐
I've added the year and opened #179 to discuss options for automation.
Overview
This PR originally only intended to add a new operation to the amend command. This addition triggered others and snowballed into a rather large overhaul of the command. I'll try to go into all changes in adequate detail.
DeleteAmbigiousLicenses
which deletes licenses identified only by name, with no other context that provides more insight about the license's content. This is not very contentious and requires little further explanation. See #144 for more.InferCopyright
no longer runs by default.ProcessLicense
InferSupplier
no longer adds supplier information to components which already have comparable fields.Compositions
now sets aggregate unknown instead of incompletecdxev/amend/operations.py
.tests/test_amend.py
2. New command-line option to select operations to run
Because the added
DeleteAmbigiousLicenses
command can be dangerous, it shouldn't run by default. This necessitates a new mechanism for the user to select the operations to run.This change introduces the
--operations
command-line option. It takes a list of operation names to run. If the option is not provided, a default set of operations is run.3. Breaking change:
InferCopyright
no longer runs by defaultThe set of default operations includes all pre-existing operations with one exception:
InferCopyright
is extremely dangerous and its outright removal from the tool is currently in debate. Pending any decision, this PR makes it an opt-in operation.4. Breaking change: Broken up
ProcessLicense
4.1 Removed feature to delete licenses with name "unknown"
The
ProcessLicense
operation previously included a feature that deleted any licenses whosename
includes the word "unknown" and that doesn't also specify the full text. This has been been removed for the following reasons:If the feature is nevertheless desired, it can always be re-added as a separate operation. In that case, it shouldn't be added to the default set.
4.2 New operation
LicenseNameToId
This operation runs by default and takes over the part of
ProcessLicense
that translates well-known names to SPDX ids.4.3 New operation
AddLicenseText
This operation takes over the part of
ProcessLicense
that adds full text to licenses with only a name.It doesn't run by default because it needs a parameter to be specified. In the interest of usability, we shouldn't force users to provide a command-line option if they don't want to use the feature.
There are a few important changes:
If this feature is nevertheless desired, we can add it as a separate, non-default operation.
5. Breaking change:
InferSupplier
is now more selectiveIn my effort to make the operation documentation fit for printing in the CLI, I discovered that
InferSupplier
was unnecessarily zealous. Given the inherent uncertainty in the information it generates (we can't really be sure that the generated supplier is valid), it probably should only operate on the components where it is really required. That's why it now no longer adds supplier information to components that already contain comparable fields (e.g.,author
,manufacturer
, etc.).6. Breaking change:
Compositions
now sets aggregate unknown instead of incompleteDiscussion: #161 Besides the aggregate, this PR also changes the behavior regarding the metadata component. Previously, this component was treated the same as any other. With this change, the metadata component keeps its original aggregate, it is not set to "unknown", unless it was "unknown" in the input.
7. Improved documentation of design constraints
The fact that individual operations are now exposed on the CLI for manual selection by the user leads to a few new design considerations for operations which have been documented in the module docstring in
cdxev/amend/operations.py
. The upshot is: keep it simple, keep it small, and be damn sure you don't run dangerous operations by default.8. Remove semi-integration tests from
tests/test_amend.py
Some tests have been removed in preparation of #157, where they will be grouped together in one module.