aboutcode-org / vulnerablecode

A free and open vulnerabilities database and the packages they impact. And the tools to aggregate and correlate these vulnerabilities. Sponsored by NLnet https://nlnet.nl/project/vulnerabilitydatabase/ for https://www.aboutcode.org/ Chat at https://gitter.im/aboutcode-org/vulnerablecode Docs at https://vulnerablecode.readthedocs.org/
https://public.vulnerablecode.io
Apache License 2.0
544 stars 200 forks source link

Collect vulnerabilties for maven packages #279

Open pombredanne opened 4 years ago

pombredanne commented 4 years ago

There is no dedicated source of data for Maven/Java package vulnerabilities. We should ensure that we can surface that data from our other sources with proper package URLs.

sbs2001 commented 4 years ago

The GitHub importer does import maven vulnerabilities.

sschuberth commented 4 years ago

The GitHub importer does import maven vulnerabilities.

Any idea how comprehensive these are compared to sources dedicated to Maven / Java?

sbs2001 commented 4 years ago

There's 248 vulnerable maven packages available from the GitHub API. Upon resolving the version ranges, we have about 15478 maven purls and their mappings to vulnerabilities.

sschuberth commented 4 years ago

Compared to that, Sonatype OSS Index seem to have 9152 pages of Maven security data, each page with 50 entries, resulting in 9152 * 50 = 457600 data sets.

sbs2001 commented 4 years ago

@sschuberth the OSS index seems to collect all maven packages irrespective of whether the package is vulnerable or not, hence the large number. VulnerableCode atm collects only maven packages who are/were found to be vulnerable.

sbs2001 commented 4 years ago

Eg there are 145154 CVEs published till now, which is nowhere close OSS index's 457600 maven packages. Those are mostly package descriptions.

FYI @pombredanne is working on a project called packagedb (soon to be public FOSS) which essentially collects all known packages from almost all ecosystems. packagedb + vulnerablecode 's combined dataset would be even more comprehensive then OSS Index because packagedb captures way more details.

Here's the package model being used there https://github.com/nexB/scancode.io/blob/c256b7d921ef2042f570ae9c753e428694987a9e/scanpipe/packagedb_models.py#L31

sschuberth commented 4 years ago

VulnerableCode atm collects only maven packages who are/were found to be vulnerable.

Which makes absolute sense đź‘Ť

But I'm still concerned that VulnerableCode might currently contain far less known Maven package vulnerabilities than there are. However, I cannot prove this with numbers. Do you know a good source that states how many known vulnerabilities there currently are in Maven packages, so we can estimate how good VulnerableCode's data currently is in this regard?

kilinitt commented 3 years ago

@sbs2001 @sschuberth @pombredanne: OSS index definitely has a lot more than GHSA. Take, for example, the following CVEs which don't exist in GHSA, but OSS index correctly associates with specific libraries:

CVE-2020-10687: io.undertow/undertow-core CVE-2020-13932: org.apache.activemq/artemis-jms-client CVE-2020-10727: org.apache.activemq/artemis-jms-client

Any idea how the association with specific jar files is made? I can't find this under the redhat advisory for undertow and apache/activemq advisory for the others.

sschuberth commented 3 years ago

Thanks a lot for this information @farimani. Now indeed my key question also is: How does OSS Index manage to map these CVEs to Maven packages, as from a quick glance nothing in the CVEs looks like (complete) Maven coordinates.

Generally, maybe looking at the source code of DependencyCheck or Dependency Track can give a clue. Or @stevespringett himself might give a hint on how to perform this mapping properly?

stevespringett commented 3 years ago

There are three primary means of identifying software, purl, CPE, and SWID. The NVD supports CPE with the intent of also supporting SWID. Other sources support other identifiers. OSS Index is not limited to Maven, ecosystem support is quite broad. OSS Index also has vulnerabilities that are not in the NVD. It’s also not the only source to support purl. There are others, but I’m not aware of any other free sources.

There’s a ticket open which asks Sonatype to consider open sourcing the purl to CPE mapping they have. No special sauce is included with Dependency Track. It simply makes a query to OSS Index to get results.

See https://github.com/OSSIndex/vulns/issues/53

kilinitt commented 3 years ago

@sschuberth dependency check also uses OSS index for these cases. The CPE based lookup does not yield a match.

sbs2001 commented 3 years ago

@farimani I can find all the mentioned CVE's at https://access.redhat.com/hydra/rest/securitydata/cve/ end point. Eg https://access.redhat.com/hydra/rest/securitydata/cve/CVE-2020-13932 .

Activemq has advisories at https://activemq.apache.org/security-advisories.data/CVE-2020-13932-announcement.txt .

I don't think they do CPE->PURL mappings. They might be parsing and infering data from above mentioned sources with maybe some manual curation in some cases.

FYI I tried to make CPE->PURL mappings by using vulnerablecode's data, https://github.com/sbs2001/purl2cpe

kilinitt commented 3 years ago

@sbs2001 thanks for the links and the purl2cpe is awesome. I'll look at it in more detail.

Regarding the CVE's and advisories, note that ActiveMQ Artemis has some 20+ jar files providing different functionalities. CVE-2020-13932/10727 specifically impact the client libraries (artemis-jms-client*.jar) and not the other ones.

There is no information in the CPEs or the advisories that points specifically to the client libraries, and so if an image contains any of the package's libraries and not the client libs, you'd have false positives. OSS index has some other info that does a more precise mapping.

stevespringett commented 3 years ago

There’s a lot of projects like that. Spring Framework is a really good example as there’s a lot of optional jars that can be used (Spring MVC, Spring Security, etc) where the CPE doesn’t mention them but the CVE description does. OSS Index handles this but there’s likely some ML in the background helping to associate CVEs to specific jars, then based on the jars, constructing the purl mapping. Other examples include MySQL which could refer to the server or a JDBC driver and typically there’s no indication in the CPE.

kilinitt commented 3 years ago

@stevespringett I doubt that there's any ML here. More likely it's because Sonatype, by the virtue of providing a great service to java developers and hosting Central, maven, etc. are in a position to have more detailed info about the vulnerability submission/fix process. I could be wrong, but one can check to see if OSS index has more detailed data for other ecosystems, for example, .NET. If so, then there's probably some cool automation going on.