eclipse-dash / dash-licenses

Extract license information from content.
http://projects.eclipse.org/projects/technology.dash
Eclipse Public License 2.0
47 stars 33 forks source link

Support for using an SBOM as input #191

Open sophokles73 opened 1 year ago

sophokles73 commented 1 year ago

With the recent uptake of SBOMs wrt to governance checks, I wonder if the dash tool should also support doing its work based on a BOM created by popular SBOM tools like CycloneDX.

waynebeaton commented 1 year ago

Seems like a reasonable thing to do. While I have investigated generating SBOMs from the Eclipse Dash License Tool, it hadn't occurred to me to do the reverse. This will require some investigation.

Tools already exist to parse SBOMs, so that shouldn't be a big problem... The fundamental problem is that of turning references to third party content in an SBOM into a format the license tool understands.

waynebeaton commented 1 year ago

The tool can now interpret purl IDs. These are used in SPDX and CycloneDX to specify references to libraries (I believe that this is consistently true).

Implementing an SBOM file reader is going to require a little restructuring of how we handle files. Currently, we decide what file reader to use based on the file name. Since SBOM formats don't make use of a consistent file name, we'll have to add a switch or something that tells the tool how to interpret the file.

In the meantime, an ugly work around is to grep the SBOM:

$ cat mySBOM.json | grep -Poh 'pkg:(?<type>[^\/]+)\/(?<group>[^\/]+)\/(?<name>[^@]+)@(?<version>[^?]+)(?=.*")' \
| sort | uniq | \
java -jar org.eclipse.dash.licenses-<version>.jar -
sbernard31 commented 9 months ago

I'm currently exploring some way to check dependencies vulnerability. (https://gitlab.eclipse.org/eclipsefdn/helpdesk/-/issues/3949)

Looking at this, I see there is some effort to define standard for SBOM and tooling to generate it. I also understand that SPDX and CycloneDX are the most popular standard.

Thinking to dash-licences, I came up with same idea than both of you . :point_up:

I will go a bit further (maybe too further :sweat_smile:) Ideally could we imagine that scanning/searching dependencies should not even be in the dash-licences scope ? And so we will get something like :

            ┌───────────────┐
            │code repository│
            └──────┬────────┘
                   │  Specific tooling
                   │  for different languages
                   ▼
                 ┌────┐
                 │SBOM│  CycloneDx or SPDX format
                 └─┬──┘
 security scanner  │    dash-licences
        ┌──────────┴─────────────┐
        ▼                        ▼
┌────────────────────┐  ┌───────────────┐
│Vulnerabities report│  │licences report│
└────────────────────┘  └───────────────┘

And so dash-licences effort could be put only on :

Currently, for scanning repository to generate SBOM, it already exists some tool for lot of language but it seems that all of this is pretty new and so some ecosystem are not yet well covered. And so it will be "better" to identify project needed by eclipse community and contribute to that project than improving dash-licences scanning.

I warned that maybe I go too further with that idea :sweat_smile:

sophokles73 commented 9 months ago

@sbernard31 I am not sure if I understand where you are going beyond what is already discussed in this issue. My intention had been to do exactly what you propose as well: let the dash tool read an SBOM and check the license info of all 3rd party deps declared in the SBOM.

Ideally could we imagine that scanning/searching dependencies should not even be in the dash-licences scope ?

FMPOV this has never been in the scope of the dash tool. Instead, it relies on arbitrary (language/build system specific) mechanisms to create a list of deps which it then processes. The only thing to be done is adding the ability to read an SBOM that has created by another tool upfront.

So, I guess we all want the same. As usual, the only thing left to do is creating a PR ;-)

sbernard31 commented 9 months ago

My intention had been to do exactly what you propose as well: let the dash tool read an SBOM and check the license info of all 3rd party deps declared in the SBOM.

Yep I agree with you :+1:

I am not sure if I understand where you are going beyond what is already discussed in this issue.

OK I try to explain it better. :slightly_smiling_face:

FMPOV this (scanning/ searching dependencies) has never been in the scope of the dash tool.

Maybe, you consider it's wrong to say that "dash-licences scan/search dependencies" because it relies on different tooling to do that ? What I wanted to say is that currently specific code is needed to support each language tooling specificity. E.g. dash-licences should know "pom.xml", "yarn.lock" and more (see https://github.com/eclipse/dash-licenses/issues/10)

My point is maybe :
dash-licences should only focus on standard language agnostic format as input (SPDX and CycloneDX) and it's up to each programming language ecosystem to have its SPDX/CycloneDX SBOM generator. If some ecosystem used by eclipse community doesn't have its SPDX/CycloneDX generator (or some feature are missing), so eclipse community should help on that generator.

Let me know if it's clearer. (or if you think I misunderstood something :pray:)

sbernard31 commented 8 months ago

(Probably obvious but cyclonedx-core-java library should probably be used to add support of cycloneDX : . it is used by : cyclonedx-maven-plugin)

waynebeaton commented 8 months ago

(Probably obvious but cyclonedx-core-java library should probably be used to add support of cycloneDX : . it is used by : cyclonedx-maven-plugin)

Yup. There's no point in reinventing this.

By leveraging this library's availability to read existing SBOMs and write new ones, we might even be able to optionally output an equivalent SBOM with licence information taken from our sources.

sbernard31 commented 8 months ago

What do you mean by :

we might even be able to optionally output an equivalent SBOM with licence information taken from our sources.

?

waynebeaton commented 8 months ago

@sbernard31 The tools that generate SBOMs grab licence information directly from the content. The Maven plug-in, for example, grabs licence information from the pom.xml files of dependencies. This licence information is frequently missing, specified inconsistently, or just plain wrong. This is one of the reasons why I've been pushing committers to ensure that their license information is specified consistently in metadata (e.g., in the pom.xml).

What I'm thinking is that we can walk through an SBOM and either add or replace the licence information for the various dependencies (and the project content) with our own.

We could then either overwrite the existing SBOM or generate a new one.

This is just a thought at this point.

sbernard31 commented 8 months ago

I think I get it but that sounds strange to me that dash-licenses updates the SBOM files. At first sight, I think this is sbom generator responsibility.

But maybe rather to update SBOM files, it could do some checks and raise error if SBOM doesn't contain expected value for an eclipse project. (e.g. license information is missing or not recognize) ?

waynebeaton commented 8 months ago

I don't think that it's strange. We'd effectively be post-processing.

Another option is to sort out how to extend the SBOM generators to use our licence information.

sbernard31 commented 8 months ago

I understand it is possible to set right information in build configuration files. (e.g. pom.xml or package.json) So if dash-licences validates that then project can fix their build configuration.

Or maybe you don't talk about project licenses information but its dependencies licenses information which are not well set too ?

waynebeaton commented 8 months ago

Yes. I'm mostly interested concerned with dependencies. We can coach our own project teams to get the metadata right. Moving forward, it looks like Sonatype is doing a better job of getting folks to specify good metadata before accepting content on Maven Central. I have no idea what sort rigour is applied when adding stuff to npmjs. Regardless, there is still a lot of content already on these software repositories that has licence information that is missing, inconsistently specified, or wrong.

sbernard31 commented 8 months ago

Ok I get it your point now. Maybe SBOM generator could also warn to get folks to specify good metadata ?

(this way not only eclipse community could improve the quality of their products)

waynebeaton commented 8 months ago

Ok I get it your point now. Maybe SBOM generator could also warn to get folks to specify good metadata ?

The biggest challenge here is that many of the libraries that end up in a dependency graph are old, and telling the person assembling a SBOM for their own content that the metadata in the vast array of dependencies over which they have no control should use better metadata isn't all that helpful.

What would be helpful, I think, is to work with the folks creating the SBOM generators to make them pluggable so that the developer can leverage ClearlyDefined (or the Dash License Tool) to get vetted license content.

But... we've diverged considerably from the focus of this issue and should probably have this conversation somewhere else. I'll think about where that is and open an issue later today.

waynebeaton commented 5 months ago

I played around with this a bit tonight.

The CycloneDX folks produce a CLI Tool that can convert an SBOM into various formats, including CSV.

This seems to work:

$ cyclonedx-linux-x64 convert --input-file stuff-cyclonedx.json --output-file stuff.csv --output-format csv
$ cat stuff.csv | awk -F, '{print $14}' | tail -n +2 | java -jar org.eclipse.dash.licenses-1.1.1-SNAPSHOT.jar -