Allow reports to have stable IDs

stigtsp commented 8 months ago

Example: in https://github.com/briandfoy/cpan-security-advisory/commit/3b63755027f2016a7d3c96b8c1a8270f3c97d17d where CPANSA-CPAN-2023-01 was changed to CPANSA-CPAN-2023-31484 after CVE-2023-31484 was added to the cves list.

This makes it hard to provide a stable reference to the record pre-CVE. Is it possible to keep the previous reference as an alias, or not change the id?

briandfoy commented 8 months ago

It's possible, but let me think about how that might work.

Can you explain why you are keeping old ID's around so I can understand your use case? Are you ignoring some IDs because they aren't being fixed, for example?

stigtsp commented 8 months ago

My specific usecase is providing references to a record when requesting a CVE number. Since the CPANSA identifier is not stable, I need to provide links to the repo pinned by commit hash like this:

https://github.com/briandfoy/cpan-security-advisory/blob/9374f98bef51e1ae887f293234050551c079776f/cpansa/CPANSA-Plack-Middleware-XSRFBlock.yml#L2-L15

There would be other similar usecases as well, like if someone is referencing to a vuln in their workflow or discussion.

briandfoy commented 8 months ago

Why would you refer to this repo to request a CVE instead of using the primary sources in the links portion of a report in this repo? I'd say the same thing about anyone who wants to reference a vulnerability: use the primary sources instead of something someone has to clickthrough to get to the primary sources.

That doesn't mean that we can't have stable ID for other reasons though.

stigtsp commented 8 months ago

Why would you refer to this repo to request a CVE instead of using the primary sources..

I'm also using the primary sources, but CPANSA entries are still relevant references for context.

robrwo commented 8 months ago

For the time being, we could have an optional "moved_from" key to indicate that the reference was renamed.

But I don't see a reason to rename the CPANSA entry just because we learn the CVE number later.

I imagine in the future that the CPAN Security people may assign a reference number, and we may want to use those.

stigtsp commented 8 months ago

I imagine in the future that the CPAN Security people may assign a reference number, and we may want to use those.

Our initial plan was to set up our own CPANSEC-XXXX-YYYY namespace, but I've suggested using CVEs instead to avoid the complexity.

briandfoy commented 8 months ago

I've been thinking about this. Even if there were a stable ID, how would you link to that through this repo in a way that's different than what you are doing right now? The files are still going to change and possibly move around, especially considering that the data format right now needs some updates. I think you are back to the same problem.

I've been renaming the IDs with the CVE because I think that's the most useful thing for the end user. They see the report ID in the output, and it's the info that gives them the shortest path to what they probably want to look at or use in other communication. It doesn't have to be that way, but I think it's better than carrying through some other identifier special to this database.

And, at the moment, I'm thinking through if anything should depend on the primary key of a record. Typically that's not good practice, but I haven't come up with an answer for that.

sjn commented 8 months ago

Suggestion: let's look at this through a database normalization perspective. How about having another table/file with CVE's as primary keys, each referring to a CPANSA id? That way we get a stable primary key for the CPANSA id space (which we'll need in order to predictably be able to refer to them), and we make it easier to find these if one only has a CVE id available...

stigtsp commented 8 months ago

[..] Even if there were a stable ID, how would you link to that through this repo in a way that's different than what you are doing right now? The files are still going to change and possibly move around, especially considering that the data format right now needs some updates. I think you are back to the same problem.

As long as there is some stable primary key for the records, (hyper) linking to them would be a solvable separate problem, imho.

There are already some tools that consume the data like Test::CVE, and I think setting up a website that provides stable links to these vulns might also be a good. We can prob do that.

I find it useful to be able to reference i.e. "CPANSA-Foobar-1" instead of "that Foobar vulnerability from yesterday affecting $somthing" in discussions as well.

Note: We have a feed set up on https://cpan-security.github.io/cpansa-feed/ that Test::CVE uses.

briandfoy commented 8 months ago

As long as there is some stable primary key for the records, (hyper) linking to them would be a solvable separate problem, imho.

"would be"? It sounds like there is no stable way that you are using right now. That was what I was asking about. This started because you did not like deep links into a particular commit where the URL had to refer to line numbers. I don't see how you'd be able to deep link into this repository without doing that, even if there were stable identifiers.

I find it useful to be able to reference i.e. "CPANSA-Foobar-1" instead of "that Foobar vulnerability from yesterday affecting $somthing" in discussions as well.

I think this is a bit specious. If I were in a discussion and wanted to refer to something, my first thought would not be to refer to the relative date. I'd say something like "the buffer overflow in foo() (CPANSA-Foobar-1)". Maybe that ID changes, but it's still easy for someone to find the report based on its content. It's also a much better reference when everyone has forgotten what the ID, no matter what it is, has forgotten what the report was about.

briandfoy commented 8 months ago

Suggestion: let's look at this through a database normalization perspective. How about having another table/file with CVE's as primary keys, each referring to a CPANSA id? That way we get a stable primary key for the CPANSA id space (which we'll need in order to predictably be able to refer to them), and we make it easier to find these if one only has a CVE id available...

If we were going to stick to stable IDs, there would still be a cves array in each record. Something else can invert that, but that would be a generated table. The best way to go here is simple files the humans can ingest and work on. Higher order uses come from digesting these files as a data source.

We don't need to make this more complicated because that makes it harder for casual contributions.

briandfoy commented 8 months ago

Going forward, such as in #137, we will not update any of the values in the id keys.

However, it is still not a good idea to link into this repo, as in https://nvd.nist.gov/vuln/detail/CVE-2022-48623. As I read through CVEs, I'm inundated with many, many links, and most of them are just references to each other with almost no original content. Don't add to that problem with another set of circular references.

sjn commented 8 months ago

Going forward, such as in #137, we will not update any of the values in the id keys.

Thanks! :grinning:

However, it is still not a good idea to link into this repo, as in https://nvd.nist.gov/vuln/detail/CVE-2022-48623. As I read through CVEs, I'm inundated with many, many links, and most of them are just references to each other with almost no original content. Don't add to that problem with another set of circular references.

Something for your consideration:

1) As long as CPANSA has any kind of authoritative information in it (which I assume it has, and will continue to have in the foreseeable future), then external entities need to be able to refer to it directly, using stable URLs and stable IDs. With your promise above, I think we're good! :partying_face: 2) Linking to a specific commit in the CPANSA is actually meaningful when there's a need to refer to specific line number(s) in a file as they are found at specific time (assuming the file at any point later may have lines added above the ones being referred to). So I guess this should be expected? 3) Circular references aren't really an issue here, are they? As long as we have stable endpoints to link to and we're clear about what their meaning and purpose, that ought to be enough to let users get an idea if the vulnerability is relevant for them?

In any case, thanks! :-D

briandfoy commented 8 months ago

You can refer to information directly by referring to it directly. Those are the references in each advisory. This project is not authoritative and does not contain authoritative information that isn't in primary resources. This project does not confirm, test, or otherwise develop information that isn't already available in primary sources.
I can't stop you from linking to anything, but Stig's complaint was that he didn't want to deep link into the repo.
Circular references make things much harder, so please stop adding to the noise. It wastes the time of the people like me who collate information and it wastes the time of the people who use the advisories. An entry in this repo provides no additional information beyond that in the primary sources.

briandfoy / cpan-security-advisory

Allow reports to have stable IDs #136