DependencyTrack / dependency-track

Dependency-Track is an intelligent Component Analysis platform that allows organizations to identify and reduce risk in the software supply chain.
https://dependencytrack.org/
Apache License 2.0
2.43k stars 529 forks source link

BOM Validation: add component name to the error #3776

Open setchy opened 1 month ago

setchy commented 1 month ago

Current Behavior

With the new BOM Validation, the HTTP 400 response could be improved to more easily pin-point the failing component

For example, this is an API response we have recently received

{
  "status": 400,
  "title": "The uploaded BOM is invalid",
  "detail": "Schema validation failed",
  "errors": [
    "$.components[928].externalReferences[1].url: does not match the iri-reference pattern must be a valid RFC 3987 IRI-reference",
    "$.components[928].externalReferences[1].url: does not match the iri-reference pattern must be a valid RFC 3987 IRI-reference",
    "$.components[928].externalReferences[1].url: does not match the regex pattern ^urn:cdx:[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}/[1-9][0-9]*$",
    "$.components[928].externalReferences[1].url: does not match the iri-reference pattern must be a valid RFC 3987 IRI-reference",
    "$.components[928].externalReferences[1].url: does not match the regex pattern ^urn:cdx:[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}/[1-9][0-9]*#.+$",
    "$.components[984].externalReferences[1].url: does not match the iri-reference pattern must be a valid RFC 3987 IRI-reference",
    "$.components[984].externalReferences[1].url: does not match the iri-reference pattern must be a valid RFC 3987 IRI-reference",
    "$.components[984].externalReferences[1].url: does not match the regex pattern ^urn:cdx:[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}/[1-9][0-9]*$",
    "$.components[984].externalReferences[1].url: does not match the iri-reference pattern must be a valid RFC 3987 IRI-reference",
    "$.components[984].externalReferences[1].url: does not match the regex pattern ^urn:cdx:[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}/[1-9][0-9]*#.+$"
  ]
}

Proposed Behavior

Log the component name, or line number in the uploaded JSON.

Checklist

nscuro commented 1 month ago

Based on the documentation of the JSON schema validator used by cyclonedx-core-java, we won't get much more information than shown in the response you posted: https://github.com/networknt/json-schema-validator?tab=readme-ov-file#results-and-output-formats

In any case, enrichment with additional information would need to be implemented in cyclonedx-core-java: https://github.com/CycloneDX/cyclonedx-core-java/blob/d11b16eb6f97dd95b0d28fef567b2058d1b76b92/src/main/java/org/cyclonedx/parsers/JsonParser.java#L174-L181

esnible commented 1 week ago

For me, the UI message is unhelpful, merely telling me that schema validation failed. The detailed message from the API server is $.components: the items in the array must be unique which is correct but not helpful.

nscuro commented 1 week ago

There's a discussion to be had about how helpful the response really needs to be (see for example https://github.com/DependencyTrack/dependency-track/issues/3218#issuecomment-1925452668).

The core intent for DT is to prevent invalid documents from being ingested, or even attempted to be ingested. It uses the official CycloneDX JSON and XML schemas to perform this task. In fact, that is exactly what the schemas are supposed to be used for.

One could argue that validation should be done even before attempting to upload the BOM to DT, using tools that are made for this purpose, and can provide additional context for errors. https://github.com/CycloneDX/sbom-utility is one of those tools. What's more, ideally the BOM generators themselves should validate their output.

nscuro commented 1 week ago

Also note that validation errors for JSON appear to contain a valid JSONPath to the invalid element.

Granted, for $.components: the items in the array must be unique that might be less helpful. but for $.components[928].externalReferences[1].url, you can use jq to inspect the value:

jq '$.components[928].externalReferences[1].url' bom.json
jq '$.components[928].name' bom.json

This could potentially be done by DT as well, however I'm not sure how to determine the context that would be helpful. What if the component's name is not unique within the BOM? What if the name is missing, and that's why validation failed? What if the BOM has deep nesting of components, up to what level do we go when collecting context information? What if the validation failed for other elements of the BOM, such as services, annotations, or attestations? As a human you can easily decide what you're interested in. To automate this, we'd need to make assumption that may just turn out to be wasted computation most of the time.

The JSONPath output, while not necessarily human-friendly, at least is both precise and concise.