aboutcode-org / vulnerablecode

A free and open vulnerabilities database and the packages they impact. And the tools to aggregate and correlate these vulnerabilities. Sponsored by NLnet https://nlnet.nl/project/vulnerabilitydatabase/ for https://www.aboutcode.org/ Chat at https://gitter.im/aboutcode-org/vulnerablecode Docs at https://vulnerablecode.readthedocs.org/
https://public.vulnerablecode.io
Apache License 2.0
528 stars 198 forks source link

Remove redundant package-urls from VCIO #1327

Open TG1999 opened 11 months ago

TG1999 commented 11 months ago

Currently, we store duplicated package-urls because of the way they are different structurally but similar when constructed as a purl whole, for example:

1st Scenario:

Purl A- type-"pypi", namespace- "", name - "foo/bar"

Purl B- type-"pypi", namespace-"foo", name - "bar"

They are structurally different, but their purl will be identical i.e "pkg:pypi/foo/bar"

2nd Scenario:

https://github.com/nexB/vulnerablecode/wiki/WeeklyMeetings#meeting-on-tuesday-2023-10-24-at-1600-utc discussed here in the weekly meeting

The solution for 2nd scenario was discussed in the weekly meeting, but what should be done for 1st scenario ?

johnmhoran commented 11 months ago

@TG1999 Re the 2d scenario, I don't recall any discussion of how we'll decide which of the duplicates to remove. The difference is not limited to the structure of the qualifiers field -- I seem to recall some duplicates have different fixed-by packages, not all duplicates have identical groups of affected by vulnerabilties, and that might be similar for fixing vulns, don't yet know. So we have some unaddressed gating items.

TG1999 commented 11 months ago

As per my discussion with @pombredanne