CycloneDX / specification

OWASP CycloneDX is a full-stack Bill of Materials (BOM) standard that provides advanced supply chain capabilities for cyber risk reduction. SBOM, SaaSBOM, HBOM, AI/ML-BOM, CBOM, OBOM, MBOM, VDR, and VEX
https://cyclonedx.org/
Apache License 2.0
339 stars 57 forks source link

Add field for `Components.modelCard.parentModel` #342

Open bardenstein opened 7 months ago

bardenstein commented 7 months ago

Proposal Separate from base models, models can have a long linage or multiple parent models. The practitioners we spoke with thought it would be helpful to allow listing one or more parent models for a given model, that may be separate from the base model.

Details FieldName: Components.modelCard.parentModel FieldType: Array. Each list item is a parent model. Contains sub-fields Name, Version, and Source. Required: No

Example snippet

"modelCard" : [
       "parentModel": {
                [
                        "name": "Stable-diffusion-xlarge",
                        "version": "1.1",
                         "source": "https://huggingface.co/models/stable-diffusion-xlarge1.1"
                ],
                [
                        "name": "daniels-custom-model",
                        "version": "2",
                         "source": "https://huggingface.co/models/daniels-custom-model-v2"
                ]
       }
  ]
stevespringett commented 7 months ago

The proposed use case can already be achieved with CycloneDX. Adding another way to describe parent models will complicate tools that will then need to support both methods.

"components": [
  {
    "type": "machine-learning-model",
    "name": "my model",
    "modelCard" : [ "$comment": "model card data goes here" ],
    "pedigree": {
      "ancestors": [
        {
          "type": "machine-learning-model",
          "name": "Stable-diffusion-xlarge",
          "version": "1.1",
          "externalReferences": [
            {
              "type": "distribution",
              "url": "https://huggingface.co/models/stable-diffusion-xlarge1.1",
            }
          ]
        },
        {
          "type": "machine-learning-model",
          "name": "daniels-custom-model",
          "version": "2",
          "externalReferences": [
            {
              "type": "distribution",
              "url": "https://huggingface.co/models/daniels-custom-model-v2",
            }
          ]
        }
      ]
    }
  }
]
bardenstein commented 7 months ago

@stevespringett in this setup, how can I tell the order of the lineage?

From the research I did with AI/ML experts, there's the scenario where a model can evolve as such: Base model > fine-tuned to Custom Model 1.0 > fine-tuned to Custom Model 1.1 > etc. etc.

The base model tells me information about the basic workings, architecture, and trustworthiness of the core model (e.g. did it start with GPT4 or Llama2), but the parent model tells me information about the most immediate changes like training data sets.

So in this list in CDX, how can I distinguish the direct "ancestor" vs the original (foundation) "ancestor"?

stevespringett commented 4 months ago

In that scenario, you'd end up with something like this:

"components": [
  {
    "type": "machine-learning-model",
    "name": "fine-tuned to Custom Model",
    "version": "1.1",
    "modelCard" : [ "$comment": "model card data goes here" ],
    "pedigree": {
      "ancestors": [
        {
          "type": "machine-learning-model",
          "name": "fine-tuned to Custom Model",
          "version": "1.0",
          "modelCard" : [ "$comment": "model card data goes here" ],
          "pedigree": {
            "ancestors": [
              {
                "type": "machine-learning-model",
                "name": "Base model",
                "modelCard" : [ "$comment": "model card data goes here" ],
              }
            ]
          }
        }
      ]
    }
  }
]