anchore / syft

CLI tool and library for generating a Software Bill of Materials from container images and filesystems
Apache License 2.0
5.98k stars 551 forks source link

Python packages: name normalization #3064

Closed Mikcl closed 1 month ago

Mikcl commented 1 month ago

The pypi components that syft generates does not normalize the python package names.

As per the python packaging documentation, the following are the same:

    friendly-bard (normalized form)
    Friendly-Bard
    FRIENDLY-BARD
    friendly.bard
    friendly_bard
    friendly--bard
    FrIeNdLy-._.-bArD (a terrible way to write a name, but it is valid)

And so python packaging tools such as pip would understand how to treat these packages.

With normalization applied to syft sboms, it would make querying for packages less reliant on the specific display name that happens to be used. However, currently without normalization applied, consumers of syft sboms are at risk of not using the correct package names, and thus missing any checks that may happen afterward.

What you expected to happen:

I would expect the python package names to be normalized according to the docs/specification above.

Additionally this is mentioned in the PURL documentation: https://github.com/package-url/purl-spec/blob/master/PURL-TYPES.rst#pypi

So i would expect this to be applied to the purl and the name


Is this something the syft team are will to consider adopting? Are there any consumer of syft that explictly require the denormalized form and would not work with the normalized form?