CRAVEX: Vulnerability Lookup and base app

pombredanne commented 6 months ago

We should create a base Vulnerability application management in DejaCode with these features:

[x] CRAVEX: Create a scheduler for vulnerability lookups that will lookup in VCIO
[x] CRAVEX: Store vulnerability lookups in a set of database models.

Also these related VCIO issues:

DennisClark commented 6 months ago

@pombredanne I would like to assign this one to Ziad but cannot see him on the Assignees list. Any suggestions please?

DennisClark commented 4 months ago

See https://github.com/nexB/vulnerablecode/blob/main/vulnerabilities/models.py and design the appropriate mapping to DejaCode.

DennisClark commented 4 months ago

A "scheduler" is a fairly new concept/feature for DejaCode. We need to determine if there is a usable Django library to facilitate creating such a feature. As a working start, let's consider a new section of the DejaCode admin Dashboard, right under the "Imports" section, called "Scheduler" (or similar), that has an initial option for "Refresh Vulnerabilities" (or similar) where the admin user can define the frequency and scope of the vulnerability refresh process to be run on an automatic basis.

Assumption: the basic scope of the vulnerability lookup is to find vulnerabilities associated with Packages and Components currently defined in the relevant DejaCode dataspace. This could be further refined to include only those that are assigned to a Product in that dataspace.

The scheduler should also include a task to update Components defined in the relevant dataspace with CPE values as those become available.

DennisClark commented 4 months ago

The proposed vulnerability model in DejaCode should be designed to support queries such as:

a filter-enabled list of all the Versions of a Component Name currently defined in the relevant dataspace, showing which ones have known vulnerabilities.
a filter-enabled list of all the Versions of a Package currently defined in the relevant dataspace, showing which ones have known vulnerabilities.
a filter-enabled list of all the Versions of a Product currently defined in the relevant dataspace, showing which ones have known vulnerabilities.
for a Vulnerability (using the same ID as VulnerableCode), provide a filter-enabled list of all packages or components in the relevant dataspace that are associated with it
for a Vulnerability (using the same ID as VulnerableCode), provide a filter-enabled list of all products in the relevant dataspace that are impacted by it
and others to be identified of course

tdruez commented 2 months ago

@pombredanne @Tushar Goel Could you tell me the PURL types from the list that are not supported (no data available) by VCIO? Excluding those will reduce the number of "useless" requests to the API. ['gem', 'autotools', 'sourceforge', 'bitbucket', 'rpm', 'gitlab', 'cran', 'windows-program', 'docker', 'bower', 'nuget', 'generic', 'cargo', 'npm', 'deb', 'golang', 'maven', 'composer', 'pypi', 'hackage', 'unknown', 'rubygems', 'about', 'github']

Well, for example we have ±300,000 sourceforge PURL in the nexB Dataspace, doing lookup for those is a total waste of time and resources.

More context: For ±133,000 packages in the nexB Dataspace, it currently takes about 1h and 2,674 HTTP requests made to the VCIO API.

The result is only 1,235 vulnerabilities fetched and created. Seems like there's a lot of wasted time and resources with our current approach.

pombredanne commented 2 months ago

@tdruez re: https://github.com/aboutcode-org/dejacode/issues/94#issuecomment-2298445423

I suggest these progressive steps:

use a hardcoded list of distinct existing PURL types in VCIO
expose this list of existing PURL types as an endpoint
expose a new special endpoint that would provide a highly-compressed data structure to download quickly from VCIO and that you can query to know if a PURL may exist in VCIO
- this could be an automaton (ahocorasick or FST) leveraging the fact that many PURL share a common prefix, or a bloom filter.
- it would be best cached for a few hours and should come withe client code to use it to filter a (long) list of PURLs to remove these that surely do not exists @ VCIO

This is tracked in this issue:

[ ] https://github.com/aboutcode-org/vulnerablecode/issues/1561

tdruez commented 2 months ago

@pombredanne Thanks, this sounds like it will require some work to make this happen.

In the short term, could VCIO expose a new "action" on the package endpoint to get this list of supported types? (Should be a very small and fast query) On the DejaCode side, the process could start with fetching the available types to get a QuerySet limited to those and drastically reduce the number a queries.

tdruez commented 2 months ago

https://github.com/aboutcode-org/dejacode/pull/148 merged, full implementation details available in the PR.

DennisClark commented 2 months ago

PR #148 provides the following CRAVEX-related functionality:

It introduces a new Vulnerability model and all the code logic to fetch and create Vulnerability records and assign those to Package/Component through ManyToMany relationships.
A new fetchvulnerabilities management command is available to fetch all the relevant data from VulnerableCode for a given Dataspace.
A scheduler was added to run the vulnerability data update daily (we can discuss and adjust this to the most suitable value, depending on how often VCIO is updated for example).
The latest vulnerability data refresh date is displayed in the Admin dashboard in a new "Data updates" section in the botton right corner.
The Package/Component views that display vulnerability information (icon or tab) are now using the data from the Vulnerability model in place of calling the VulnerableCode API on each request. This result into much better performances as we do not depend on the VulnerableCode performances to render the DejaCode view anymore. Also, this will make Vulnerability data available in the Reporting system.
A filter is available next to the "Identifier" column header in the Package list view, and Product tabs.
The vulnerability icon is displayed next to the Package/Component identifier in the Product views: "Inventory", "Hierarchy", "Dependencies" tabs.
The vulnerability data is available in Reporting either through the is_vulnerable property on Package/Component column template or going through the full affected_by_vulnerabilities m2m field. This is available in both Query and ColumnTemplate. Query example: Package > affected_by_vulnerabilities > IS_NULL = False

Scheduler:

Add a scheduler service to run the vulnerability updates on a daily basis: https://github.com/aboutcode-org/dejacode/pull/148/commits/5978612140443057e5aa8df2a19b3fb3be019831 https://github.com/rq/rq-scheduler?tab=readme-ov-file#cron-jobs https://github.com/rq/django-rq?tab=readme-ov-file#support-for-scheduled-jobs

TODO:

[x] Update the ProductTabInventoryView and all Product related views in general to the new system
[x] Display the vulnerability icon in Herarchy/Dependency tabs
[x] https://github.com/nexB/dejacode/issues/2
[x] Add a scheduler to run the vulnerability updates on a daily basis using RQScheduler.
[x] Fix and add tests
[x] On Package/Component addition (Add package, Import, etc...) lookup and create the vulnerabilities
[x] Complete the management command

aboutcode-org / dejacode

CRAVEX: Vulnerability Lookup and base app #94