Current metadata.json model has limited information, enough to provide data for project itself, but can be extended to provide better assessment to compliance and scanning tools.
Information, like license, description, upstream source and cpe string would be the minimal necessary for current existing tooling like Ort or Scancode have minimal details to process.
There's a reason to include such information as become the normative to be required instead be optional, for generating legal documentation ( i.e. SPDX )
An example of possible layout desirable on metadata.json ( using zlib bazel registry as example )
{
"homepage": "https://zlib.net",
"maintainers": [
{
"email": "bazel-dev@googlegroups.com",
"name": "The Bazel Team"
}
"description": "lib is designed to be a free, general-purpose, legally unencumbered -- that is, not covered by any patents -- lossless data-compression library..."
"license": "ZLib"
],
"versions": [
{
"version": "1.2.11",
"source_url": "https://zlib.net/zlib-1.2.11.tar.gz",
"cpeId": "cpe:2.3:a:gnu:zlib:1.2.11:*:*:*:*:*:*:*"
},
],
"yanked_versions": {}
}
So, brief explanation behind new elements:
description - This is the most descriptive field, but is expected as some of the minimal metadata requirements
license - This is a mandatory field that is a point of origin of scanning, considered declared license. This leads to end on cross licensing comparison against the the concluded licenses. Example of reference of OSS licenses can be found here: SPDX License Database
source_upstream - Normally the source used in build can be stored in a different origin than upstream origin, Most of the times companies or individuals decide to have a local copy of the source in case of limited access to external network. So the correct separation on /source.json and the upstream source would becomes valid. As a reference, the used code and the referenced code are detected in and considered normal in compliance automation processes.
cpeid - This is one of the most necessary fields, as without it, no CVE securtiy scanning tool can easily detect software version and release to check against vulnerability. For reference about CPE strings: CPE Mitre
As the build info is stated on \<version>/source.json, the cpeid can then be used to create the company variant later, but without it, there's no reference to create similar one.
@heliocastro I would rather see purl than cpeid being support here. PURLs can be generated and have been adopted by various open source communities for example OSV used purl in its schema to uniquely identify packages and their known security vulnerabilities.
Current metadata.json model has limited information, enough to provide data for project itself, but can be extended to provide better assessment to compliance and scanning tools.
Information, like license, description, upstream source and cpe string would be the minimal necessary for current existing tooling like Ort or Scancode have minimal details to process.
There's a reason to include such information as become the normative to be required instead be optional, for generating legal documentation ( i.e. SPDX )
An example of possible layout desirable on metadata.json ( using zlib bazel registry as example )
So, brief explanation behind new elements: