CycloneDX / cdxgen

Creates CycloneDX Bill of Materials (BOM) for your projects from source and container images. Supports many languages and package managers. Integrate in your CI/CD pipeline with automatic submission to Dependency Track server. Slack: https://cyclonedx.slack.com/archives/C04NFFE1962
https://cyclonedx.github.io/cdxgen/
Apache License 2.0
490 stars 142 forks source link

[Python] License scanning for Python projects changes components in specific circumstances #929

Open johennin opened 3 months ago

johennin commented 3 months ago

During some internal testing, it was discovered that running cdxgen with license scanning such as:

FETCH_LICENCE=true cdxgen --type python --output sbom.json /path/to/project

behaves differently when naming a project with the same name as an existing Python package and when not including the license scan.

For example, naming a local Python project after an existing Python package and running cdxgen without license scanning with the following pyproject.toml file:

[project]
name = "typing-extensions"
...

will give us the following component without license scanning:

...
"components": [
        {
            "group": "",
            "name": "typing-extensions",
            "version": "latest",
            "purl": "pkg:pypi/typing-extensions@latest",
            "type": "library",
            "bom-ref": "pkg:pypi/typing-extensions@latest",
            "evidence": {
                "identity": {
                    "field": "purl",
                    "confidence": 1,
                    "methods": [
                        {
                            "technique": "instrumentation",
                            "confidence": 1,
                            "value": "/tmp/cdxgen-venv-2GQ1QR"
                        }
                    ]
                }
            }
        },
...

and a different component with license scanning activated:

...
"components": [
        {
            "author": "\"Guido van Rossum, Jukka Lehtosalo, Łukasz Langa, Michael Lee\" <levkivskyi@gmail.com>",
            "group": "",
            "name": "typing-extensions",
            "version": "latest",
            "description": "Backported and Experimental Type Hints for Python 3.8+",
            "licenses": [
                {
                    "license": {
                        "id": "PSF-2.0",
                        "url": "https://opensource.org/licenses/PSF-2.0"
                    }
                }
            ],
            "purl": "pkg:pypi/typing-extensions@latest",
            "type": "library",
            "bom-ref": "pkg:pypi/typing-extensions@latest",
            "evidence": {
                "identity": {
                    "field": "purl",
                    "confidence": 1,
                    "methods": [
                        {
                            "technique": "instrumentation",
                            "confidence": 1,
                            "value": "/tmp/cdxgen-venv-FnAVau"
                        }
                    ]
                }
            },
...

I would say that it is problematic because it changes the component that the SBOM describes.

Then again, it is solved by not using a name already in use by a Python package BUT it can be abused if, for example, an attacker knows the name of local projects which companies produce SBOMs for and can manipulate the SBOMs component with false information by creating and publishing a Python package with the same name as that project.

Thank you in advance!

prabhu commented 3 months ago

@johennin, I will keep this issue open. While I do not agree that cdxgen must deal with dependency confusion attacks it could at least add more properties to describe the source file it started the analysis from.