aboutcode-org / scancode-toolkit

:mag: ScanCode detects licenses, copyrights, dependencies by "scanning code" ... to discover and inventory open source and third-party packages used in your code. Sponsored by NLnet project https://nlnet.nl/project/vulnerabilitydatabase, the Google Summer of Code, Azure credits, nexB and others generous sponsors!
https://aboutcode.org/scancode/
2.13k stars 550 forks source link

Support SPDX JSON output format #3698

Open goneall opened 7 months ago

goneall commented 7 months ago

Short Description

Add support for SPDX JSON as an output format

Possible Labels

Enhancement

Select Category

Describe the Update

Add support for SPDX JSON. The tools-python library supports JSON - so this may be relatively straightforward to add. Note that it would be very incremental to add support for YAML in the same code update.

How This Feature will help you/your organization

I noticed in a recent review of different tools - Quality Assessment of SBOM Generation Tools and Standards on Open Source Projects - the scancode SPDX output could not be used in the evaluation due to the lack of JSON support. JSON has become one of the most popular formats used in SPDX 2.3. Adding direct support would likely benefit both scancode and SPDX adoption.

Possible Solution/Implementation Details

From the requirements.txt file, it looks like we are already using version 0.8.1 of the spdx-tools which support JSON output.

It looks like the changes are mostly (if not completely) localized to output_spdx.py plus any related documentation.

The code would need to be slightly refactored to use an enumeration rather than a Boolean to describe the output format choice for SPDX.

Can you help with this Feature

Although Python is not my primary language of choice, I could provide a pull request if it helps (although it may take a bit longer for a review cycle vs. someone more Python experienced).

pombredanne commented 7 months ago

@goneall Thanks... should we drop RDF then?

goneall commented 7 months ago

should we drop RDF then?

I would definitely prioritize JSON ahead of RDF, but I don't see a reason to drop RDF as a format since it is already supported. Although less popular, it may be in use in some environments and dropping RDF may be a breaking change.