Perl-Toolchain-Gang / CPAN-DistnameInfo

Extract information from a CPAN distribution name
http://search.cpan.org/dist/CPAN-DistnameInfo/
9 stars 9 forks source link

New feature: Package URL support (e.g. a "purl" method?) #6

Open sjn opened 1 year ago

sjn commented 1 year ago

Hei!

Would it be sensible for this package to support the creation of a Package URL for distros?

For example, a package identified as IBMTORDB2/DBD-DB2-0.99.tar.bz2 might get a PURL like this: pkg:cpan/IBMTORDB2/DBD-DB2@0.99?ext=tar.bz2.

The purpose of this is to have a canonical naming of packages that work across ecosystems, and that are suitable for using in SBOM (Software Bill of Material) documents.

Leont commented 1 year ago

For example, a package identified as IBMTORDB2/DBD-DB2-0.99.tar.bz2 might get a PURL like this: pkg:cpan/IBMTORDB2/DBD-DB2@0.99?ext=tar.bz2.

I'm not sure I understand the difference between these two. In particular the reasoning behind it.

sjn commented 1 year ago

The purpose is for creating a standard way to refer to dependencies across packaging systems.

For example, with a common schema like this, an SBOM can easy list dependencies in whatever multitude of sources that was used to put together any particular application it is meant to describe. Maybe there's a few CPAN packages downloaded from the company DarkPAN, and a bunch from CPAN, and a few were already found installed with the system perl package (which might come from an RPM repo)... The idea is to make it possible to represent all these sources in a standard and common way.

I guess there may be cases where package URLs can't offer enough nuance (i.e. how do we specify a DEB package came from an internal mirror?), but I guess this is something we can figure out later.

My hope with this is to open the door for a conversation on how to represent software packages across ecosystems, and the Package URL spec seems to be a good place to start.

The spec helps clarifying:

When tools, APIs and databases process or store multiple package types, it is difficult to reference the same software package across tools in a uniform way.

For example, these tools, specifications and API use relatively similar approaches to identify and locate software packages, each with subtle differences in syntax, naming and conventions:

sjn commented 1 year ago

For example, CycloneDX (one SBOM standard worth exploring) needs some standard way to refer to packages when they link security advisories with whatever is installed. PURL is one way, and Software ID (SWID) is another (defined in ISO/IEC 19770-2:2015)

https://cyclonedx.org/use-cases/#known-vulnerabilities

sjn commented 1 year ago

Thinking a little more about how to refer to internal CPAN repos; this may be possible to do with a repository_url=hostname parameter...

pkg:cpan/IBMTORDB2/DBD-DB2@0.99?ext=tar.bz2&repository_url=cpan.org # default repository_url can be skipped pkg:cpan/IBMTORDB2/DBD-DB2@0.99?ext=tar.bz2&repository_url=internacpan.mycompany.example

(I guess this may require some new input to the new method to work)

Source: https://github.com/package-url/purl-spec/blob/master/PURL-SPECIFICATION.rst

Leont commented 1 year ago

I should rephrase that. Why /IBMTORDB2/DBD-DB2@0.99?ext=tar.bz2 instead of IBMTORDB2/DBD-DB2-0.99.tar.bz2? I would prefer a 1-on-1 mapping unless there's a reason otherwise. Filenames are unique on CPAN if that's the requirement.

sjn commented 1 year ago

Why /IBMTORDB2/DBD-DB2@0.99?ext=tar.bz2 instead of IBMTORDB2/DBD-DB2-0.99.tar.bz2?

Mostly because the spec says the version number is optional. Also, there's a few oddities around version numbers on CPAN, so having it clearly & unambiguouosly delimited is useful.

I'm thinking this is so the same URL can be used to both specify individual versioned objects and the projects themselves (which basically would mean "download the latest from here", I guess).