Feature idea: add CPE/PURL as fields to `projects`

antonleviathan commented 1 month ago

While working with anitya to set up a bot to bump package versions for [Stageˣ], I had the thought that it would be nice to have two additional fields which would help the users figure out if they are exposed to CVEs as well as construct SBOMs:

cpe: CPE
purl: PURL

The intuition came from the fact that the project mappings in anitya allow us to take project names and map them onto our desired name - but we have a shared point of reference where we can add metadata easily.

arthurzam commented 1 month ago

As a Gentoo developer, I think this might be nice and has great potential for sync work on generating SBOMs across distros.

For example in gentoo, we started not long ago to add CPE metadata to packages, but out progress is slow. Mapping gentoo packages to anitya was faster (thanks to github/pypi/... mappings), so if for example this is implemented, we could upload our information, download remote mapping, and enrich both ends. We more distors join this, we could all help each other quite fast.

Zlopez commented 1 month ago

This sounds like interesting idea. Could you specify how the two new fields should work and look?

antonleviathan commented 1 month ago

This sounds like interesting idea. Could you specify how the two new fields should work and look?

Of course, here is an example of what a package might look like when retrieved from api/v2/projects?name=gnupg2:

{
  "backend": "custom",
  "created_on": 1412174985,
  "ecosystem": "https://gnupg.org/download/",
  "homepage": "https://gnupg.org/download/",
  "cpe_2.3": "cpe:2.3:a:gnupg:gnupg:-:*:*:*:*:*:*:*",
  "cpe_2.2": "cpe:/a:gnupg:gnupg:-",
  "purl": "scheme:type/namespace/name@version?qualifiers#subpath"
  "id": 1215,
  "name": "gnupg2",
  "regex": "(?i)gnupg(?:[-_]?(?:minsrc|src|source))?[-_]([^-/_\\s]+?)(?:[-_](?:minsrc|src|source|asc|release))?\\.(?:tar|t[bglx]z|tbz2|zip)",
  "stable_versions": [
    "2.5.1",
    "2.4.5",
    ...
  ]
}

This is what it may look like in the UI: Screenshot 2024-10-03 at 09-58-07 gnupg2 · Anitya

When retrieved as a distro mapping (api/v2/packages/?distribution=[Stageˣ]&name=gpg):

{
  "distribution": "[Stage\\u02e3]",
  "ecosystem": "https://gnupg.org/ftp/gcrypt/gnupg/ ",
  "name": "gnupg2",
  "project": "gpg",
  "stable_version": "2.5.1",
  "version": "2.5.1",
  "cpe_2.3": "cpe:2.3:a:gnupg:gnupg:-:*:*:*:*:*:*:*",
  "cpe_2.2": "cpe:/a:gnupg:gnupg:-",
  "purl": "scheme:type/namespace/name@version?qualifiers#subpath"
}

Note that cpe_2.3 and cpe_2.2 are only the "base" of the identifier, with wildcards used for the rest of the values. This allows the end consumer to modify the version themselves easily. In this case the two strings for the current gnupg2 version would be:

cpe_2.3": "cpe:2.3:a:gnupg:gnupg:2.5.1:*:*:*:*:*:*:*
cpe_2.2": "cpe:/a:gnupg:gnupg:2.5.1

I'm not as familiar with PURL but it would be good for us to figure out if it makes sense to include it. From what I've gathered PURL is inferred, rather than given like CPE, so it's easier to work with, and can more readily be converted to CPE2.3. CPE2.2 is not well structured so that's just a legacy thing we may consider including.

In terms of the implementation, I would make these nullable string, and indexed (assuming it won't cause issues with over-indexing on the db which from what I see in the current model it shouldn't)

cpe_2.3 = sa.Column(sa.String(200), nullable=True, index=True)
cpe_2.2 = sa.Column(sa.String(200), nullable=True, index=True)
purl = sa.Column(sa.String(200), nullable=True, index=True)

We may consider grouping these into identifiers or something like that, but I don't know if it's necessary and it complicates the model.

As @arthurzam mentioned, this may make it easier for the current users of anitya to work together to essentially crowdsource getting these additional identifiers up and available for those who already consume data from it.

I'm going to tag @pombredanne here in hopes he has the time to give us input regarding supporting PURL. Philippe, anitya is used to help distribution maintainers stay on top of new software releases, and so the basic idea is that for a given package, different distributions add a mapping to their distribution, and can then use the anitya (https://release-monitoring.org) API to check for updates, or subscribe to the fedora messaging bus. My current impression is that PURL is focused on things that are already packaged, and we are the ones who actually do the packaging, so I'm wondering if PURL fits in here or not. For example, above we have gnupg2 - before it's packaged, would PURL be able to support this?

Zlopez commented 1 month ago

It doesn't seem difficult to implement, just adding few new fields to UI and database.

antonleviathan commented 1 month ago

Yep it's straight forward. I would like to figure out if it's worthwhile to add PURL. Any thoughts on that @Zlopez? As of right now it looks like it may not be the thing we want but I'm not sure.

Zlopez commented 1 month ago

@antonleviathan I don't know much about purl to say if this will be helpful or not for release-monitoring.org users. CPE sounds like a good addition, I'm just not sure why we need both cpe_2_3 and cpe_2_2.

antonleviathan commented 1 month ago

I agree cpe_2_3 is probably enough.

antonleviathan commented 1 month ago

Shall I put together a PR @Zlopez?

Zlopez commented 1 month ago

PRs are welcome :-) Feel free to put one together.

antonleviathan commented 1 month ago

Btw it looks like PURL would work well too. There is a generic tag that would address my concern: https://github.com/package-url/purl-spec/blob/master/PURL-TYPES.rst#generic

fedora-infra / anitya

Feature idea: add CPE/PURL as fields to `projects` #1821