aboutcode-org / dejacode

Automate open source license compliance and ensure software supply chain integrity
https://dejacode.readthedocs.io
GNU Affero General Public License v3.0
25 stars 8 forks source link

CRAVEX: Vulnerability Lookup and base app #94

Closed pombredanne closed 2 months ago

pombredanne commented 6 months ago

We should create a base Vulnerability application management in DejaCode with these features:

Also these related VCIO issues:

DennisClark commented 6 months ago

@pombredanne I would like to assign this one to Ziad but cannot see him on the Assignees list. Any suggestions please?

DennisClark commented 4 months ago

See https://github.com/nexB/vulnerablecode/blob/main/vulnerabilities/models.py and design the appropriate mapping to DejaCode.

DennisClark commented 4 months ago

A "scheduler" is a fairly new concept/feature for DejaCode. We need to determine if there is a usable Django library to facilitate creating such a feature. As a working start, let's consider a new section of the DejaCode admin Dashboard, right under the "Imports" section, called "Scheduler" (or similar), that has an initial option for "Refresh Vulnerabilities" (or similar) where the admin user can define the frequency and scope of the vulnerability refresh process to be run on an automatic basis.

Assumption: the basic scope of the vulnerability lookup is to find vulnerabilities associated with Packages and Components currently defined in the relevant DejaCode dataspace. This could be further refined to include only those that are assigned to a Product in that dataspace.

The scheduler should also include a task to update Components defined in the relevant dataspace with CPE values as those become available.

DennisClark commented 4 months ago

The proposed vulnerability model in DejaCode should be designed to support queries such as:

tdruez commented 2 months ago

@pombredanne @Tushar Goel Could you tell me the PURL types from the list that are not supported (no data available) by VCIO? Excluding those will reduce the number of "useless" requests to the API. ['gem', 'autotools', 'sourceforge', 'bitbucket', 'rpm', 'gitlab', 'cran', 'windows-program', 'docker', 'bower', 'nuget', 'generic', 'cargo', 'npm', 'deb', 'golang', 'maven', 'composer', 'pypi', 'hackage', 'unknown', 'rubygems', 'about', 'github']

Well, for example we have ±300,000 sourceforge PURL in the nexB Dataspace, doing lookup for those is a total waste of time and resources.

More context: For ±133,000 packages in the nexB Dataspace, it currently takes about 1h and 2,674 HTTP requests made to the VCIO API.

The result is only 1,235 vulnerabilities fetched and created. Seems like there's a lot of wasted time and resources with our current approach.

pombredanne commented 2 months ago

@tdruez re: https://github.com/aboutcode-org/dejacode/issues/94#issuecomment-2298445423

I suggest these progressive steps:

This is tracked in this issue:

tdruez commented 2 months ago

@pombredanne Thanks, this sounds like it will require some work to make this happen.

In the short term, could VCIO expose a new "action" on the package endpoint to get this list of supported types? (Should be a very small and fast query) On the DejaCode side, the process could start with fetching the available types to get a QuerySet limited to those and drastically reduce the number a queries.

tdruez commented 2 months ago

https://github.com/aboutcode-org/dejacode/pull/148 merged, full implementation details available in the PR.

DennisClark commented 2 months ago

PR #148 provides the following CRAVEX-related functionality:

Scheduler:


TODO: