Santandersecurityresearch / cryptobom-forge

Tools and utilities needed to parse GitHub Multi-Repository Variant Analysis output
MIT License
13 stars 1 forks source link

Query on input requirement #2

Closed mtcolman closed 9 months ago

mtcolman commented 10 months ago

Hello, more of a query than an issue. The instructions say:

The parameter is versatile, accepting either:

A path to a single CodeQL output file, or A directory path containing multiple CodeQL outputs.

But I'm unsure what this equates to? A .sarif file , .bqrs file? And does a particular query have to have been run to form it?

I've tried running with .sarif files but get:

code_snippet = codeql_result['locations'][0]['physicalLocation']['contextRegion']['snippet']['text']
KeyError: 'contextRegion'

my .sarif file is sarif-2.1.0:

{
  "$schema" : "https://json.schemastore.org/sarif-2.1.0.json",
  "version" : "2.1.0",
  "runs" : [ {
    "tool" : {
      "driver" : {
        "name" : "CodeQL",
        "organization" : "GitHub",
        "semanticVersion" : "2.15.1",

I've then rerun my .sarif file generation and included the flag --sarif-add-snippets and then run the command again:

python3.10/site-packages/cyclonedx/model/bom.py", line 535, in register_dependency
ref=target.bom_ref,
AttributeError: 'NoneType' object has no attribute 'bom_ref'

And on a directory containing .bqrs files and I get the following which looks rather empty:

$ cat cbom.json
{
    "metadata": {
        "timestamp": "2023-12-07T11:17:16.309580+00:00",
        "tools": [
            {
                "externalReferences": [
                    {
                        "type": "build-system",
                        "url": "https://github.com/CycloneDX/cyclonedx-python-lib/actions"
                    },
                    {
                        "type": "distribution",
                        "url": "https://pypi.org/project/cyclonedx-python-lib/"
                    },
                    {
                        "type": "documentation",
                        "url": "https://cyclonedx.github.io/cyclonedx-python-lib/"
                    },
                    {
                        "type": "issue-tracker",
                        "url": "https://github.com/CycloneDX/cyclonedx-python-lib/issues"
                    },
                    {
                        "type": "license",
                        "url": "https://github.com/CycloneDX/cyclonedx-python-lib/blob/main/LICENSE"
                    },
                    {
                        "type": "release-notes",
                        "url": "https://github.com/CycloneDX/cyclonedx-python-lib/blob/main/CHANGELOG.md"
                    },
                    {
                        "type": "vcs",
                        "url": "https://github.com/CycloneDX/cyclonedx-python-lib"
                    },
                    {
                        "type": "website",
                        "url": "https://cyclonedx.org"
                    }
                ],
                "name": "cyclonedx-python-lib",
                "vendor": "CycloneDX",
                "version": "4.2.2"
            }
        ]
    },
    "serialNumber": "urn:uuid:5cb9d639-0a96-48e8-af23-e4e5471ea6ce",
    "version": 1,
    "$schema": "https://raw.githubusercontent.com/IBM/CBOM/main/bom-1.4-cbom-1.0.schema.json",
    "bomFormat": "CBOM",
    "specVersion": "1.4-cbom-1.0"
}

Thanks in advance.

mtcolman commented 10 months ago

Hi, after further testing with output.sarif file I've found this:

python3.10/site-packages/cyclonedx/model/bom.py", line 535, in register_dependency
ref=target.bom_ref,
AttributeError: 'NoneType' object has no attribute 'bom_ref'

Is related to not having versionControlProvenance in the output.sarif file - I'm not sure how to get this in the file though?

Relevent code items are: main.py

def _read_file(query_file, exclusion_pattern=None):
    with open(query_file) as query_output:
        query_output = json.load(query_output)['runs'][0]

        if file_count < 2:
            if version_control_details := query_output.get('versionControlProvenance'):
                root_component = metadata.get_root_component_info(version_control_details=version_control_details[0])
                cbom.metadata.component = root_component
            for tool in metadata.get_tool_info(tool_info=query_output['tool']):
                cbom.metadata.tools.add(tool)

metadata.py

def get_root_component_info(version_control_details):
    path = version_control_details['repositoryUri'].split('https://github.com/')[1]
    external_reference = ExternalReference(url=version_control_details['repositoryUri'], type=ExternalReferenceType.SCM)

    return Component(
        bom_ref=path,
        name=path.split('/')[-1],
        type=ComponentType.APPLICATION,
        external_references=[external_reference]

Which means that when if not (existing_component occurs, the cbom.metadata.component is None for the call to cbom.register_dependency which then results in the error.)

algorithm.py

def parse_algorithm(cbom, codeql_result):
    crypto_properties = _generate_crypto_component(codeql_result)
    if (padding := crypto_properties.algorithm_properties.padding) not in [Padding.OTHER, Padding.UNKNOWN]:
        name = f'{crypto_properties.algorithm_properties.variant}-{padding.value.upper()}'
    else:
        name = crypto_properties.algorithm_properties.variant

    algorithm_component = Component(
        bom_ref=f'cryptography:algorithm:{uuid.uuid4()}',
        name=name,
        type=ComponentType.CRYPTO_ASSET,
        crypto_properties=crypto_properties
    )

    if not (existing_component := _is_existing_component_overlap(cbom, algorithm_component)):
        cbom.components.add(algorithm_component)
        cbom.register_dependency(cbom.metadata.component, depends_on=[algorithm_component])
    else:
        algorithm_component = _update_existing_component(existing_component, algorithm_component)
emilejq commented 10 months ago

Hi,

The input needs to be in SARIF, and it can be either a single file or split between several. It's intended to parse the output from queries for identifying crypto assets. The Python ones for example can be seen here, or for instructions on using them with GitHub Actions, look at this.

The code snippets are used to gather additional information about crypto assets when adding them to the CBOM, so yeah, you should be using --sarif-add-snippets.

The expected format of the input SARIF has so far been based on some sample output from the queries that has always included this information, but it looks like versionControlProvenance is not a standard key according to this.

I've now updated it accordingly so that the application name, which is a required a field when generating a root component, can optionally be passed in via --application-name, or if not, a default component will be created with the name root. Try again with version 1.0.1.

Thanks!

mtcolman commented 9 months ago

Thank you. I've tried 1.0.1 and was able to use an output.sarif file that did not contain versionControlProvenance.