RFC: Design new Advisory -> Package -> Vulnerability models relationhips

pombredanne commented 9 months ago

Context

In our current approach, we aggregate data from multiple advisories in a single vulnerability which is unique based on its aliases.

Problem

An Advisory may be strictly about a given Package ecosystem, and provide a score just for an ecosystem. Therefore, if we merge and combine everything in a single Vulnerability, we can end up with misleading data or messy data. In some other cases, we historically mixed importing and improving, leading to performance and confusion issues.

For instance we have these issues:

1297
1193
889
838
361

Solution

The relationship should not be between a Package and a Vulnerability but rather a Package and an Advisory, and an Advisory to a Vulnerability. Similarly, scores, categories and references may be specific to an Advisory and not about all the packages subject to a Vulnerability.

In this design we would essentially adopt a structure similar to that of VulnTotal where multiple advisories are either concurring to the same impact conclusion or may disagree (which becomes a problem that needs curation either with a manual review or improvers)

Severity may also need some rethinking as they are from an Advisory and specific to some packages in many cases. For instance the severity/scores published by RedHat are only about the RPM packaging of a vulnerable package, not about any package or upstream in general.

See also:

1068
1316
1355 because the date of publishing may be really an advisory concept

@Hritik14 @TG1999 @keshav-space @DennisClark

pombredanne commented 9 months ago

Some random notes:

an Improver can fix data bugs. These need to run only once sometimes, and are almost like data migrations, but we do not want data migrations for them as they can run while the system is used and available.
an Improver can improve data and in this case it needs to run more than once
an Improver can also create new data based on fetching external data. This is almost like an improver would be an advisory source in this case. Should we treat an Improver as an advisory?
we need to run improvers in parallel

pombredanne commented 9 months ago

This is a major refactoring of the models with major implications on the UI and the API since Advisories would become of first-class objects.

We could possibly create some facade and adapters such that in the short term we can do model changes and do not need to refactor the whole UI and API.

This would be more work but would avoid breaking things upfront.

Here is a first possible approach with ghost relationships

We create the Advisory to Vulnerability relationships and a Package relationship in this relationship. This would not change the current models and the API and UI stay the same. We populate these relationship with improvers and/data migrations and progressively update all importers to use and populate these models. We still merge the data in the advisory record.
Progressively we update the UI and create a new version of the API that will use the new data structure. At some point of time, we stop update the legacy Vulnerability fields

Another approach is to create new models

We would create and will use in parallel a full set of new duplicated models like Package2, Vulnerability2 and so on have a v2 of the UI and API.

At some point in the future we would deprecate and retire the old models, UI and API. This is more work and some duplicated code, BUT means less instability.

pombredanne commented 9 months ago

After discussion this is preferred:

Here is a first possible approach with ghost relationships

We create the Advisory to Vulnerability relationships and a Package relationship in this relationship. This would not change the current models and the API and UI stay the same. We populate these relationship with improvers and/data migrations and progressively update all importers to use and populate these models. We still merge the data in the advisory record.

Progressively we update the UI and create a new version of the API that will use the new data structure. At some point of time, we stop update the legacy Vulnerability fields

We will therefore have progressively a new v2 API and v2 UI in small chunks and will avoid wrecking everything.

pombredanne commented 1 week ago

aboutcode-org / vulnerablecode