aboutcode-org / vulnerablecode

A free and open vulnerabilities database and the packages they impact. And the tools to aggregate and correlate these vulnerabilities. Sponsored by NLnet https://nlnet.nl/project/vulnerabilitydatabase/ for https://www.aboutcode.org/ Chat at https://gitter.im/aboutcode-org/vulnerablecode Docs at https://vulnerablecode.readthedocs.org/
https://public.vulnerablecode.io
Apache License 2.0
529 stars 199 forks source link

RFC: Design new Advisory -> Package -> Vulnerability models relationhips #1393

Open pombredanne opened 9 months ago

pombredanne commented 9 months ago

Context

In our current approach, we aggregate data from multiple advisories in a single vulnerability which is unique based on its aliases.

Problem

An Advisory may be strictly about a given Package ecosystem, and provide a score just for an ecosystem. Therefore, if we merge and combine everything in a single Vulnerability, we can end up with misleading data or messy data. In some other cases, we historically mixed importing and improving, leading to performance and confusion issues.

For instance we have these issues:

Solution

The relationship should not be between a Package and a Vulnerability but rather a Package and an Advisory, and an Advisory to a Vulnerability. Similarly, scores, categories and references may be specific to an Advisory and not about all the packages subject to a Vulnerability.

In this design we would essentially adopt a structure similar to that of VulnTotal where multiple advisories are either concurring to the same impact conclusion or may disagree (which becomes a problem that needs curation either with a manual review or improvers)

Severity may also need some rethinking as they are from an Advisory and specific to some packages in many cases. For instance the severity/scores published by RedHat are only about the RPM packaging of a vulnerable package, not about any package or upstream in general.

See also:

@Hritik14 @TG1999 @keshav-space @DennisClark

pombredanne commented 9 months ago

Some random notes:

pombredanne commented 9 months ago

This is a major refactoring of the models with major implications on the UI and the API since Advisories would become of first-class objects.

We could possibly create some facade and adapters such that in the short term we can do model changes and do not need to refactor the whole UI and API.

This would be more work but would avoid breaking things upfront.

Here is a first possible approach with ghost relationships

  1. We create the Advisory to Vulnerability relationships and a Package relationship in this relationship. This would not change the current models and the API and UI stay the same. We populate these relationship with improvers and/data migrations and progressively update all importers to use and populate these models. We still merge the data in the advisory record.

  2. Progressively we update the UI and create a new version of the API that will use the new data structure. At some point of time, we stop update the legacy Vulnerability fields

Another approach is to create new models

We would create and will use in parallel a full set of new duplicated models like Package2, Vulnerability2 and so on have a v2 of the UI and API.

At some point in the future we would deprecate and retire the old models, UI and API. This is more work and some duplicated code, BUT means less instability.

pombredanne commented 9 months ago

After discussion this is preferred:

Here is a first possible approach with ghost relationships

  • We create the Advisory to Vulnerability relationships and a Package relationship in this relationship. This would not change the current models and the API and UI stay the same. We populate these relationship with improvers and/data migrations and progressively update all importers to use and populate these models. We still merge the data in the advisory record.

  • Progressively we update the UI and create a new version of the API that will use the new data structure. At some point of time, we stop update the legacy Vulnerability fields

We will therefore have progressively a new v2 API and v2 UI in small chunks and will avoid wrecking everything.

pombredanne commented 1 week ago

See also:

We could add a field at the advisory level to track its license, but we are tracking the license consistently for each importer already. Just food for thoughts.