Closed pombredanne closed 2 months ago
@pombredanne I would like to assign this one to Ziad but cannot see him on the Assignees list. Any suggestions please?
See https://github.com/nexB/vulnerablecode/blob/main/vulnerabilities/models.py and design the appropriate mapping to DejaCode.
A "scheduler" is a fairly new concept/feature for DejaCode. We need to determine if there is a usable Django library to facilitate creating such a feature. As a working start, let's consider a new section of the DejaCode admin Dashboard, right under the "Imports" section, called "Scheduler" (or similar), that has an initial option for "Refresh Vulnerabilities" (or similar) where the admin user can define the frequency and scope of the vulnerability refresh process to be run on an automatic basis.
Assumption: the basic scope of the vulnerability lookup is to find vulnerabilities associated with Packages and Components currently defined in the relevant DejaCode dataspace. This could be further refined to include only those that are assigned to a Product in that dataspace.
The scheduler should also include a task to update Components defined in the relevant dataspace with CPE values as those become available.
The proposed vulnerability model in DejaCode should be designed to support queries such as:
@pombredanne @Tushar Goel Could you tell me the PURL types from the list that are not supported (no data available) by VCIO? Excluding those will reduce the number of "useless" requests to the API. ['gem', 'autotools', 'sourceforge', 'bitbucket', 'rpm', 'gitlab', 'cran', 'windows-program', 'docker', 'bower', 'nuget', 'generic', 'cargo', 'npm', 'deb', 'golang', 'maven', 'composer', 'pypi', 'hackage', 'unknown', 'rubygems', 'about', 'github']
Well, for example we have ±300,000 sourceforge PURL in the nexB Dataspace, doing lookup for those is a total waste of time and resources.
More context: For ±133,000 packages in the nexB Dataspace, it currently takes about 1h and 2,674 HTTP requests made to the VCIO API.
The result is only 1,235 vulnerabilities fetched and created. Seems like there's a lot of wasted time and resources with our current approach.
@tdruez re: https://github.com/aboutcode-org/dejacode/issues/94#issuecomment-2298445423
I suggest these progressive steps:
This is tracked in this issue:
@pombredanne Thanks, this sounds like it will require some work to make this happen.
In the short term, could VCIO expose a new "action" on the package endpoint to get this list of supported types? (Should be a very small and fast query) On the DejaCode side, the process could start with fetching the available types to get a QuerySet limited to those and drastically reduce the number a queries.
https://github.com/aboutcode-org/dejacode/pull/148 merged, full implementation details available in the PR.
PR #148 provides the following CRAVEX-related functionality:
Vulnerability
model and all the code logic to fetch and create Vulnerability
records and assign those to Package/Component
through ManyToMany relationships.fetchvulnerabilities
management command is available to fetch all the relevant data from VulnerableCode for a given Dataspace.Vulnerability
model in place of calling the VulnerableCode API on each request. This result into much better performances as we do not depend on the VulnerableCode performances to render the DejaCode view anymore. Also, this will make Vulnerability data available in the Reporting system.is_vulnerable
property on Package/Component column template or going through the full affected_by_vulnerabilities
m2m field. This is available in both Query and ColumnTemplate. Query example: Package > affected_by_vulnerabilities > IS_NULL = False
Scheduler:
TODO:
We should create a base Vulnerability application management in DejaCode with these features:
Also these related VCIO issues: