Open pombredanne opened 4 years ago
The GitHub importer does import maven vulnerabilities.
The GitHub importer does import maven vulnerabilities.
Any idea how comprehensive these are compared to sources dedicated to Maven / Java?
There's 248 vulnerable maven packages available from the GitHub API. Upon resolving the version ranges, we have about 15478 maven purls and their mappings to vulnerabilities.
Compared to that, Sonatype OSS Index seem to have 9152 pages of Maven security data, each page with 50 entries, resulting in 9152 * 50 = 457600 data sets.
@sschuberth the OSS index seems to collect all maven packages irrespective of whether the package is vulnerable or not, hence the large number. VulnerableCode atm collects only maven packages who are/were found to be vulnerable.
Eg there are 145154 CVEs published till now, which is nowhere close OSS index's 457600 maven packages. Those are mostly package descriptions.
FYI @pombredanne is working on a project called packagedb (soon to be public FOSS) which essentially collects all known packages from almost all ecosystems. packagedb + vulnerablecode 's combined dataset would be even more comprehensive then OSS Index because packagedb captures way more details.
Here's the package model being used there https://github.com/nexB/scancode.io/blob/c256b7d921ef2042f570ae9c753e428694987a9e/scanpipe/packagedb_models.py#L31
VulnerableCode atm collects only maven packages who are/were found to be vulnerable.
Which makes absolute sense đź‘Ť
But I'm still concerned that VulnerableCode might currently contain far less known Maven package vulnerabilities than there are. However, I cannot prove this with numbers. Do you know a good source that states how many known vulnerabilities there currently are in Maven packages, so we can estimate how good VulnerableCode's data currently is in this regard?
@sbs2001 @sschuberth @pombredanne: OSS index definitely has a lot more than GHSA. Take, for example, the following CVEs which don't exist in GHSA, but OSS index correctly associates with specific libraries:
CVE-2020-10687: io.undertow/undertow-core CVE-2020-13932: org.apache.activemq/artemis-jms-client CVE-2020-10727: org.apache.activemq/artemis-jms-client
Any idea how the association with specific jar files is made? I can't find this under the redhat advisory for undertow and apache/activemq advisory for the others.
Thanks a lot for this information @farimani. Now indeed my key question also is: How does OSS Index manage to map these CVEs to Maven packages, as from a quick glance nothing in the CVEs looks like (complete) Maven coordinates.
Generally, maybe looking at the source code of DependencyCheck or Dependency Track can give a clue. Or @stevespringett himself might give a hint on how to perform this mapping properly?
There are three primary means of identifying software, purl, CPE, and SWID. The NVD supports CPE with the intent of also supporting SWID. Other sources support other identifiers. OSS Index is not limited to Maven, ecosystem support is quite broad. OSS Index also has vulnerabilities that are not in the NVD. It’s also not the only source to support purl. There are others, but I’m not aware of any other free sources.
There’s a ticket open which asks Sonatype to consider open sourcing the purl to CPE mapping they have. No special sauce is included with Dependency Track. It simply makes a query to OSS Index to get results.
@sschuberth dependency check also uses OSS index for these cases. The CPE based lookup does not yield a match.
@farimani I can find all the mentioned CVE's at https://access.redhat.com/hydra/rest/securitydata/cve/
Activemq has advisories at https://activemq.apache.org/security-advisories.data/CVE-2020-13932-announcement.txt .
I don't think they do CPE
->PURL
mappings. They might be parsing and infering data from above mentioned sources with maybe some manual curation in some cases.
FYI I tried to make CPE->PURL
mappings by using vulnerablecode's data, https://github.com/sbs2001/purl2cpe
@sbs2001 thanks for the links and the purl2cpe is awesome. I'll look at it in more detail.
Regarding the CVE's and advisories, note that ActiveMQ Artemis has some 20+ jar files providing different functionalities. CVE-2020-13932/10727 specifically impact the client libraries (artemis-jms-client*.jar) and not the other ones.
There is no information in the CPEs or the advisories that points specifically to the client libraries, and so if an image contains any of the package's libraries and not the client libs, you'd have false positives. OSS index has some other info that does a more precise mapping.
There’s a lot of projects like that. Spring Framework is a really good example as there’s a lot of optional jars that can be used (Spring MVC, Spring Security, etc) where the CPE doesn’t mention them but the CVE description does. OSS Index handles this but there’s likely some ML in the background helping to associate CVEs to specific jars, then based on the jars, constructing the purl mapping. Other examples include MySQL which could refer to the server or a JDBC driver and typically there’s no indication in the CPE.
@stevespringett I doubt that there's any ML here. More likely it's because Sonatype, by the virtue of providing a great service to java developers and hosting Central, maven, etc. are in a position to have more detailed info about the vulnerability submission/fix process. I could be wrong, but one can check to see if OSS index has more detailed data for other ecosystems, for example, .NET. If so, then there's probably some cool automation going on.
There is no dedicated source of data for Maven/Java package vulnerabilities. We should ensure that we can surface that data from our other sources with proper package URLs.