Open pombredanne opened 1 year ago
Hey thanks for reaching out. I'm happy to help. I'm currently working on some research related to SBOMs. Therefore i've pinned the versions of the SBOM generators i use the get comparable results.
I've implemented a Webtool to compare the results and see the generation details. (Currently it's very slow because that thing runs on a potato. I work on adding some cashing to make it faster) https://sbom.seclab.cs.hm.edu/#/
Here a link to the jenkins project view were you can see how scancode has performed compared to other projects and phases. https://sbom.seclab.cs.hm.edu/#/project/43/dependencies
The website is just a little side hustle of mine but i hope it helps. If there are questions on how this side works please feel free to reach out.
What made it very hard for me to use Scancode is, that it consumes lots of resources and takes long to scan a big project like keycloak.
Here the semantics of the command i've used for generation
"scancode -clpi -n 10 --cyclonedx /path/to/output.json /path/to/sources
Version information
ScanCode version: 32.0.6
ScanCode Output Format version: 3.0.0
SPDX License list version: 3.21
@Mariuxdeangelo thanks for replying and joining the discussion!
https://sbom.seclab.cs.hm.edu/#/project/43/dependencies is awesome!
Some background:
ScanCode toolkit (SCTK) is somewhat unique as this is the only tool that is doing some extensive license and copyright detection and not only a package manifest scan ... this is an expensive operation alright.
Now we also have ScanCode.io (SCIO) where we script in pipelines complex scans (including full docker images) and this is better suited for images alright! It embeds SCTK.
SCTK and SCIO are doing things differently: SCTK is th a CLI-only that just trucks and grinds through a codebase in memory, while SCIO will perform things as needed and store then all in the backing DB following a script.
In SCIO, you can scan docker://jenkins/jenkins:latest
and get improved results than anything in SCTK
In addition we have PurlDB, where we can do matching against indexed FOSS packages at https://github.com/nexB/purldb/ which is being rolled in in the SCIO pipelines.
Here is an example of a SCIO screenshot BTW running a docker pipeline on your jenkins image:
and the corresponding CDX JSON from SCIO: scancodeio_jenkins_jenkins_latest_results-2023-09-28-13-45-11.cdx.json.txt
and full JSON results using the ScanCode format: scancodeio_jenkins_jenkins_latest.json.txt
Not perfect yet but getting there.
This does not include any matching against the PurlDB though
PS: you may not know that purls started in ScanCode ;)
Thanks for the insights. I will look into that as soon as i can.
Scanning all files of a project is definitely a cool idea and, of course, uses some resources. You're not only working on SBOMs; there are other use cases where you use that data. Only for me, this was an issue, running Scancode on over 100 fairly large projects with limited resources. I still have some ideas of what I want to do with Scancode that are on my bucket list.
See this article by @Mariuxdeangelo https://mariuxdeangelo.gitlab.io/website/#/post/20230924-SBOM-dependency-semantics-SPDX-and-CycloneDx
The scancode results are not great. We can do better!
@Mariuxdeangelo do you mind to share the URLs to the image and archive you have used? and also which version of scancode you used? Toolkit or ScanCode.io? Thanks!