CycloneDX / cyclonedx-node-module

creates CycloneDX Software-Bill-of-Materials (SBOM) from node-based projects
https://cyclonedx.org/
Apache License 2.0
124 stars 37 forks source link

Issue with using this library when it finds a license that isn't in the list #42

Closed DarthHater closed 2 years ago

DarthHater commented 4 years ago

We are using this library in auditjs to generate an SBOM to send to Nexus IQ Server.

We've run into an issue with a few libraries where the license is presented as something that isn't in your current list.

An example is:

https://github.com/substack/node-optimist/blob/master/package.json#L35

That license is declared as MIT/X11 when in reality it should either be MIT X (not in your list) or X11

When the sbom is created, a license section with a Name is created, just no ID, and this fails validation in Nexus IQ Server.

I'm curious if we should add these kinda odd license types to your list, or if a PR of some sort where if it can't find an ID it adds something that indicates the license is Non-Conforming or something akin?

Thanks!

DarthHater commented 4 years ago

I suppose another alternative is to provide a flag that just ignores licensing data (which is hyper specific to our case), but I don't personally like that.

stevespringett commented 4 years ago

The node-optimist library contains an invalid license in package.json. It is a requirement that the license defined in package.json be a valid SPDX license ID or a valid SPDX expression. This library is so old, that this requirement likely wasn't enforced when this component was last published.

https://spdx.org/licenses/MIT.html https://spdx.org/licenses/X11.html

It appears this library is dual licensed which should be declared in the package.json, but it's not.

Unresolved licenses like this will appear as a license 'name' in CycloneDX rather than a license ID. License ID's only resolve to valid SPDX license IDs and are not required by the spec. If Nexus IQ requires a license id, then it is violating the spec. This is an issue you'll need corrected by Sonatype.

DarthHater commented 4 years ago

It's actually not dual licensed when you read about MIT/X11, it's just improperly declared. X11 is MIT with an extra sentence.

stevespringett commented 4 years ago

Thanks for clarifying. MIT/X11 is still an invalid SPDX license ID, therefore the license name is used instead of the id.

stevespringett commented 4 years ago

See also: https://github.com/CycloneDX/specification/blob/master/schema/bom-1.1.xsd#L105 https://github.com/CycloneDX/specification/blob/master/schema/bom-1.1.xsd#L220

DarthHater commented 4 years ago

Took a look at the spec and you are right (didn't expect you to be wrong, by the way).

I'm trying to think of a middle ground, personally.

            if (spdxLicenses.some(v => { return l === v; })) {
                licenseContent.id = l;
            } else {
                licenseContent.id = 'UNKNOWN';
                licenseContent.name = l;
            }

Something like that seems to accomplish it, and indicate to someone they are using an Unknown license, which is I think accurate in this case.

DarthHater commented 4 years ago

Optimist is FAR from the only package where I saw this coming up, and being a lot of these projects are dead or predate some sort of standard, trying to finagle a middle ground.

DarthHater commented 4 years ago

The side benefit for someone using this to generate a SBOM is that they'd then know that whatever they are using has an old license and should probably be looked at with a bit more scrutiny?

stevespringett commented 4 years ago

Whatever value is defined in the license id field will be validated against https://cyclonedx.org/schema/spdx.xsd which is derived from https://github.com/spdx/license-list-data.

'UNKNOWN' is not a valid SPDX license ID. If you create a BOM with this value, it will fail to validate.

License strings which do not resolve to valid SPDX license id's, will always use the license name field. That's what the spec defines and that's what all official CycloneDX implementations produce.

DarthHater commented 4 years ago

So given license data is optional via the cyclonedx spec, can we provide a flag to just say "don't provide this data"? That seems like a good middle ground, and on spec.

stevespringett commented 4 years ago

Roughly half of all ecosystems do not enforce SPDX license ID's. Java (Maven) and .NET (Nuget) are prime examples of this. This is a very common scenario and certainly not unique to this library. The overwhelming majority of libraries in Maven Central and NuGet are not using SPDX license IDs.

SBOMs are a statement of fact. Removing facts goes against the primary objectives of the spec. The only exception that's been made is license text, which can greatly influence the size of the resulting BOM.

You're free to fork the project and create your own version which excludes facts. However, I still recommend solving the problem in Nexus IQ itself.

DarthHater commented 4 years ago

I work for Sonatype, if that wasn't clear, and I'm here because Nexus IQ requires an ID and that's a hard requirement. Forking a library is a pretty poor excuse for not working with the author to make it more flexible, and I don't see a reason why it shouldn't be, given the spec itself marks this entire section as not required. If someone wants to throw it away, they'd be making an intentional choice to do so.

stevespringett commented 4 years ago

Asking an independent open source author to make changes to every CycloneDX implementation to exclude details so that a commercial vendor can more easily do their job, is not a proper ask.

Dependency-Track attempts to resolve license names and does a decent job of it for certain licenses. If the license name cannot be resolved, then the license is discarded (as if it was never defined in the bom).

DarthHater commented 4 years ago

I am not asking you to make changes, I've already got multiple permutations of this fired up on my side. I'm asking you to accept changes I'm willing to make, because they seem to make sense, looking at the spec itself, and the quality of the data I can expect to get from the npm ecosystem. I've presented an array of potential solutions to the problem I'm encountering, and I feel like I'm thinking as much as I can about the spec, and reality that I'm running in to. I'm unwilling to fork this repo longterm because:

I get what you mean by spirit of the SBOM, and I'm sympathetic, but if something in the spec is marked as optional, giving someone the option to discard it seems fine, as it is done by intent, and doesn't violate the spec.

Also I have no clue what you mean by dropping the license text in your earlier comment, as I'm unaware of that option, and it doesn't appear to be documented. Please share that because it does indeed cause huge bloat (we are seeing 2-3mb SBOMs, and we don't need that data at all).

I'll drop this convo after this, but I feel like adding an option to discard the license data, by intention doesn't violate any part of the spec, and just adds additional flexibility to someone like me, the consumer.

ButterB0wl commented 4 years ago

So adding a flag to strip out non-required fields is out of the question? We can write the PR if you would accept it we just don't want to strip it after the fact in our module, but we will as this is currently breaking for us.

stevespringett commented 4 years ago

Excluding license text is a common ask and it's one that doesn't really take away from the facts, only the supporting evidence. It has been implemented in a few official implementations, but I don't think it's made its way to the node version yet. But it will.

Also keep in mind that there are numerous unofficial implementations as well, which include:

For example, a container security vendor that produces a BOM or ORT which produces a BOM are both capable of generating BOMs with npm modules. This is only a one implementation, but there isn't a guarantee that users will use this version. ORT for example, is extremely popular. I personally use it in some cases. That project comes specifically from a licensing background.

So even if this (or all official) implementation did have an option to exclude license, it doesn't solve the underlying issue. CycloneDX is a specification which just happens to have some implementations. There are many other ways to generate them, and many more currently in development, both publicly and internally at various organizations.

DarthHater commented 4 years ago

I'm aware of quite a few unofficial implementations, as I've crafted a few already (and been meaning to get in touch with you over if you want me to make them more generic and inherit them). I've got Golang and Conda specific ones at current time, but a bit bastardized just because I had to make things work in non traditional ways. The conda one is here if you wanted to take a gander:

https://github.com/sonatype-nexus-community/jake/tree/master/jake/cyclonedx/v1_1

Yeah best I can tell excluding license text hasn't made it here (I could add that in a heartbeat for you, if you'd like).

I'd really rather work with you to provide flexibility to this library rather than roll my own, and a few other reasons why:

DarthHater commented 4 years ago

Popping back in to summarize:

As well, I packaged it all up on my own fork of this project:

https://github.com/DarthHater/cyclonedx-node-module/tree/TypeScript

If you want I can send you a PR, and if you want I can strip out the contentious license data removal piece (although I obviously still think it's beneficial). The PR itself I think would be beneficial as it:

While I was at it I wired up an index.ts that will create a cli app like the previous one in the bin, I couldn't figure out how that was created, so I've generated a cli app with similar flags, etc... or shown examples where I thought a flag wasn't needed.

Cheers, and thanks for providing this library, in general!

stevespringett commented 4 years ago

Thanks for the summary and the work you and the rest of the team has done with implementing it in your own ecosystems.

The project is always looking for contributions, and it sounds like you've made numerous improvements, both architecturally and in overall quality.

And although I cannot accept a PR which excludes license ID or name (but can accept exclusion of license text), I am certainly open to reviewing a PR. It would also likely be much easier to keep in sync if necessary.

There has been requests to support Bower (#35) so the architectural changes you made may make that possible in a future release.

jkowalleck commented 2 years ago

reviewed the issue thwth the original MIT/X11 license of the given example. cyclonedx-node-module currently detects the license and writes it as a non-SPDX-license correctly according to CycloneDX spec 1.3 behavior is as expected. Therefore this issue will be closed.