Open mrutkows opened 1 year ago
Interesting proposal.
I really like the implicated outcome and benefits. But I doubt that your solution covers it properly. And here is why:
So you want to link to external documents via externalReference
.
These documents might be not under your control and might change without notice.
These external documents, might be bound to XML/JSON schema, which these documents usually announce themselves internally.
$schema
.I find it to be BAD idea to announce the schema of a foreign document externally. Downstream users need to download/read the document anyway, to check if the claim in the SBOM was correct.
From a technical perspective, it is impossible to point an XML to exactly one schema.
From a technical perspective, XML/JSON schemas are extensible. I could create a superset of your schema, and be still compliant with your schema. Still, I would announce my schema, not yours. So if you are scanning a BOM for externalReference
s that apply to your schema, you would not find my document, even if it is compliant to your schema. A false-negative.
Let me reverse engineer your thought process and start with the requirements engineering. (A friendly reminder: It might really help, if you could describe constraints and desired capabilities, not just give some reasons why a proposed solution is sufficient for a set of users)
SO your case is enabling BOM users to understand the explicit usage of an externalReference
d document, right?
externalReference
already has properties url
, type
, comment and hashes
.
You claim that type
is not enough.
If I understand your original proposed solution correctly, the idea of linking a Schema comes from the following aspects:
externalReference.type
are insufficient for youexternalReference
to store the use-case(s) of a referenced document.
The value range of this "use-case" is about to expand, why you want to use arbitrary strings, instead of an ENUM.
The value should be free. The value should be kind of universal, so that a kind of standards exists.Then here is my alternative proposal for a solution
Add a new property externalReference.purposes
that is an ENUM which is managed outside the CycloneDX schema.
The document that manages the purposes can be short living and be extended/updated independently of the CycloneDX schema and its life cycle. Just like the "SPDX" license ID in CycloneDX is managed already.
The following code is untested...
Example schema
{
"$schema": "http://json-schema.org/draft-07/schema#",
"$id": "http://cyclonedx.org/schema/bom-1.5.schema.json",
"definitions": {
"externalReference": {
"type": "object",
"title": "External Reference",
// the existing things ...
"properties": {
// the existing things ...
"purposes": {
"type": "array",
"items": {
"allOf": [
{
"type": "string",
"title": "Purpose"
},
{
"$ref": "http://cyclonedx.org/schema/externalReference-purposes.schema.json"
}
]
}
}
}
}
<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:vc="http://www.w3.org/2007/XMLSchema-versioning"
xmlns:bom="http://cyclonedx.org/schema/bom/1.5"
xmlns:externalReferencePurposes"http://cyclonedx.org/schema/externalReferencePurposes"
>
<xs:import namespace="http://cyclonedx.org/schema/externalReferencePurposes"
schemaLocation="http://cyclonedx.org/schema/externalReferencePurposes.xsd"/>
<xs:complexType name="externalReference">
<xs:sequence>
<!-- the existing things ... -->
<xs:element name="purposes" minOccurs="0" maxOccurs="1">
<xs:complexType>
<xs:sequence minOccurs="0" maxOccurs="unbounded">
<xs:element name="purpose" type="externalReferencePurposes:values"/>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:sequence>
</xs:complexType>
<xs:schema>
Example externalReference-purposes.schema.json
{
"$schema": "http://json-schema.org/draft-07/schema#",
"$id": "http://cyclonedx.org/schema/externalReference-purposes.schema.json",
"$comment": "v1.0",
"enum": [
"Foo"
// some values here
]
}
Example externalReferencePurposes.xsd
<?xml version="1.0" encoding="utf-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
elementFormDefault="qualified"
targetNamespace="http://cyclonedx.org/schema/externalReferencePurposes"
version="1.0">
<xs:simpleType name="values">
<xs:restriction base="xs:string">
<xs:enumeration value="Foo"/>
<!-- some values here -->
</xs:restriction>
</xs:simpleType>
</xs:schema>
But I doubt that your solution covers it properly. And here is why:
@jkowalleck no, my proposal clearly does not cover it properly as this was an ad-hoc idea during a live ML workstream WG call that Steve asked me to open an issue for ;) The initial thought was to allow reference to schemas (disassociated from the actual reference) for optional use by tooling and not necessarily force schema validation using those external schemas.
My simple goals were:
types
values that where the corresponding url
MAY be validatable datato be fair, externalReference has been effectively treated/discussed as a link (behavior) in many contexts (e.g., discussions in work groups); however right now its "types" have little value programmatically and are more descriptive (as you noted effectively). If indeed it has a "schema" then it can be inferred (but could be made explicit) and type that can be validated... or use the existing "type" field more clearly by creating "data" (or even "data-validatable) where "schema" would be used as the document for such.
Doing further research after posting yesterday... in any regard, we may want to look at actually implementing the concept of "schema"
as its own object (definition) which not only allows us to reuse it, but also may allow us to take advantage in the future of using dynamic anchors. From my basic reading dynamism in recent schema revisions would allow us to replace our definition of schema with an actual external schema reference (and be recognized by off-the-shelf validators). I need to read more, but it is my understanding that OpenAPI uses this in their v3.1 spec. but it has caused some issues which are resolved in versions of JSON schema post draft 7 (which is problematic for many validators that do not implement interim revisions).
Here is the JSON revision for polymorphism (schema reuse) along with an article that I will need to re-read multiple times and allot time to do my own testing...
Per-comments above and in the long-term I, effectively, would like to leverage dynamism in actual schema (JSON schema, not sure current state of XSD) where actual off-the-shelf validators would be able to identify the schema associated with the content pointed to by the ext. ref. and minimize the creation of CycloneDX-only fields which only custom CycloneDX tools (validators) would understand.
Again, goals vs. ability to represent in current JSON or XML schema... and as noted, I need to find time to research and test. However, this is the kind of dialog I wanted.
Just wanted to get the concept documented as quickly as possible...
This needs flushed out more. Moving to v1.6.
Moving to v1.7. This needs to be flushed out more.
In discussions being had within the ML work group around "model cards" we ack. that the in-progress CycloneDX schema to describe ML models has, at best ad-hoc standards to draw common data from. Specifically, we have found various structured and unstructured data being presented specific to ML service providers (e.g., Google, AWS, IBM, etc.) and ML catalogs such as HuggingFace and projects like Tensorflow. Therefore, our approach is to adopt basic data objects and fields that make sense of the commonality we have found, but wish to allow for providers to reference descriptors (e.g., schemas) for their specific model information (data inputs, outputs, statistical analysis, imaging, etc.) using the
"externalReference"
objects. It would be helpful for automation, to know for validation or visualization purposes, if the referenced data has a published schema (e.g., in XSD or JSON) to apply against the information/data pointed to by the"url"
field.As we support new XBOM types (e.g., Crypto CBOM, Machine Learning MLBOM, etc.) we will encounter more and more domain-specific data (hopefully with structure schemas) and it was encouraged the I take my proposal to add a
"schema"
field to the"externalReference"
to make the reference more meaningful in those cases.Applicability to existing reference types:
General types
"bom"
"build-meta"
Specific Types
Hopefully, you can see from these few examples (and perhaps suggest new "type" values for the enum field) the value of adding the
"schema"
field with an associated data type that reflects a URI (or the more generalized for IRI). This might appear in the "ExternalReference" definition as follows:Note: we may also want to add a complimentary field such as to disambiguate what schema version (draft) of either XML schema or JSON schema was used as well... shown above as
"schemaType"
.