CycloneDX / specification

OWASP CycloneDX is a full-stack Bill of Materials (BOM) standard that provides advanced supply chain capabilities for cyber risk reduction. SBOM, SaaSBOM, HBOM, AI/ML-BOM, CBOM, OBOM, MBOM, VDR, and VEX
https://cyclonedx.org/
Apache License 2.0
339 stars 57 forks source link

Add schema reference to "externalReference" object #185

Open mrutkows opened 1 year ago

mrutkows commented 1 year ago

In discussions being had within the ML work group around "model cards" we ack. that the in-progress CycloneDX schema to describe ML models has, at best ad-hoc standards to draw common data from. Specifically, we have found various structured and unstructured data being presented specific to ML service providers (e.g., Google, AWS, IBM, etc.) and ML catalogs such as HuggingFace and projects like Tensorflow. Therefore, our approach is to adopt basic data objects and fields that make sense of the commonality we have found, but wish to allow for providers to reference descriptors (e.g., schemas) for their specific model information (data inputs, outputs, statistical analysis, imaging, etc.) using the "externalReference" objects. It would be helpful for automation, to know for validation or visualization purposes, if the referenced data has a published schema (e.g., in XSD or JSON) to apply against the information/data pointed to by the "url" field.

As we support new XBOM types (e.g., Crypto CBOM, Machine Learning MLBOM, etc.) we will encounter more and more domain-specific data (hopefully with structure schemas) and it was encouraged the I take my proposal to add a "schema" field to the "externalReference" to make the reference more meaningful in those cases.

Applicability to existing reference types:

Hopefully, you can see from these few examples (and perhaps suggest new "type" values for the enum field) the value of adding the "schema" field with an associated data type that reflects a URI (or the more generalized for IRI). This might appear in the "ExternalReference" definition as follows:

 "externalReference": {
      "type": "object",
      "title": "External Reference",
      "description": "Specifies an individual external reference",
      "required": [
        "url",
        "type"
      ],
      "additionalProperties": false,
      "properties": {
        "url": {
          "type": "string",
          "title": "URL",
          "description": "The URL to the external reference",
          "format": "iri-reference"
        },
        "schema": {
           "type": "string",
           "title": "Schema",
           "description": "Reference to a document that defines the schema (e.g., elements, attributes and data types) for the document referenced by the URL.",
           "format": "iri-reference",
           "examples": ["http://csrc.nist.gov/ns/oscal/1.0"]
        },
        "schemaType": {
           "type": "string",
           "title": "Schema Type",
           "description": "Reference to the versioned schema format specification",
           "format": "iri-reference",
           "examples": ["http://json-schema.org/draft-07/schema#"]
        },
        "comment": {
          ...
        },
        "type": {
          "type": "string",
          "title": "Type",
          "enum": [ "vcs", "issue-tracker", "website", ...  // etc.
          ]
        },
        "hashes": {
             ...
        }
      }
    },

Note: we may also want to add a complimentary field such as to disambiguate what schema version (draft) of either XML schema or JSON schema was used as well... shown above as "schemaType".

jkowalleck commented 1 year ago

Interesting proposal.

I really like the implicated outcome and benefits. But I doubt that your solution covers it properly. And here is why:


So you want to link to external documents via externalReference. These documents might be not under your control and might change without notice.

These external documents, might be bound to XML/JSON schema, which these documents usually announce themselves internally.

I find it to be BAD idea to announce the schema of a foreign document externally. Downstream users need to download/read the document anyway, to check if the claim in the SBOM was correct.

From a technical perspective, it is impossible to point an XML to exactly one schema.

From a technical perspective, XML/JSON schemas are extensible. I could create a superset of your schema, and be still compliant with your schema. Still, I would announce my schema, not yours. So if you are scanning a BOM for externalReferences that apply to your schema, you would not find my document, even if it is compliant to your schema. A false-negative.


Let me reverse engineer your thought process and start with the requirements engineering. (A friendly reminder: It might really help, if you could describe constraints and desired capabilities, not just give some reasons why a proposed solution is sufficient for a set of users)

SO your case is enabling BOM users to understand the explicit usage of an externalReferenced document, right? externalReference already has properties url, type, comment and hashes. You claim that type is not enough.

If I understand your original proposed solution correctly, the idea of linking a Schema comes from the following aspects:


Then here is my alternative proposal for a solution

Add a new property externalReference.purposes that is an ENUM which is managed outside the CycloneDX schema. The document that manages the purposes can be short living and be extended/updated independently of the CycloneDX schema and its life cycle. Just like the "SPDX" license ID in CycloneDX is managed already.

The following code is untested...

Example schema

Example externalReference-purposes.schema.json

{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "$id": "http://cyclonedx.org/schema/externalReference-purposes.schema.json",
  "$comment": "v1.0",
  "enum": [
    "Foo"
    // some values here
  ]
}

Example externalReferencePurposes.xsd

<?xml version="1.0" encoding="utf-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
           elementFormDefault="qualified"
           targetNamespace="http://cyclonedx.org/schema/externalReferencePurposes"
           version="1.0">
  <xs:simpleType name="values">
    <xs:restriction base="xs:string">
      <xs:enumeration value="Foo"/>
      <!-- some values here -->
    </xs:restriction>
  </xs:simpleType>
</xs:schema>
mrutkows commented 1 year ago

But I doubt that your solution covers it properly. And here is why:

@jkowalleck no, my proposal clearly does not cover it properly as this was an ad-hoc idea during a live ML workstream WG call that Steve asked me to open an issue for ;) The initial thought was to allow reference to schemas (disassociated from the actual reference) for optional use by tooling and not necessarily force schema validation using those external schemas.

My simple goals were:

to be fair, externalReference has been effectively treated/discussed as a link (behavior) in many contexts (e.g., discussions in work groups); however right now its "types" have little value programmatically and are more descriptive (as you noted effectively). If indeed it has a "schema" then it can be inferred (but could be made explicit) and type that can be validated... or use the existing "type" field more clearly by creating "data" (or even "data-validatable) where "schema" would be used as the document for such.

Doing further research after posting yesterday... in any regard, we may want to look at actually implementing the concept of "schema" as its own object (definition) which not only allows us to reuse it, but also may allow us to take advantage in the future of using dynamic anchors. From my basic reading dynamism in recent schema revisions would allow us to replace our definition of schema with an actual external schema reference (and be recognized by off-the-shelf validators). I need to read more, but it is my understanding that OpenAPI uses this in their v3.1 spec. but it has caused some issues which are resolved in versions of JSON schema post draft 7 (which is problematic for many validators that do not implement interim revisions).

Here is the JSON revision for polymorphism (schema reuse) along with an article that I will need to re-read multiple times and allot time to do my own testing...

mrutkows commented 1 year ago

Per-comments above and in the long-term I, effectively, would like to leverage dynamism in actual schema (JSON schema, not sure current state of XSD) where actual off-the-shelf validators would be able to identify the schema associated with the content pointed to by the ext. ref. and minimize the creation of CycloneDX-only fields which only custom CycloneDX tools (validators) would understand.

Again, goals vs. ability to represent in current JSON or XML schema... and as noted, I need to find time to research and test. However, this is the kind of dialog I wanted.

Just wanted to get the concept documented as quickly as possible...

stevespringett commented 1 year ago

This needs flushed out more. Moving to v1.6.

stevespringett commented 5 months ago

Moving to v1.7. This needs to be flushed out more.