anchore / grype

A vulnerability scanner for container images and filesystems
Apache License 2.0
8.55k stars 555 forks source link

Java cataloger reports various packages by the wrong name #192

Open pdevine-cb opened 3 years ago

pdevine-cb commented 3 years ago

What happened: When analyzing an image w/ various jar files, the name of the packages often does not match the expected name used in other utilities such as grype. Some examples:

This causes mis-matches with various vulnerability feeds which leads to false negatives inside other tools such as grype.

What you expected to happen: I would expect the detected names to either reflect the names expected in other tools. Alternatively, those other tools such as grype could do a better job of matching names, however I think that might be difficult to accomplish.

How to reproduce it (as minimally and precisely as possible): $ syft jenkins/jenkins:2.249.2-lts-jdk11

Anything else we need to know?: These CVEs were missed in jenkins/jenkins as a result of the names not matching: (u'jquery', u'1.2.1', u'CVE-2020-11023', u'Medium'), (u'jquery', u'1.2.1', u'CVE-2020-11022', u'Medium'), (u'jquery', u'1.2.1', u'CVE-2019-11358', u'Medium'), (u'jquery', u'1.2.1', u'CVE-2015-9251', u'Medium'), (u'jquery', u'1.2.1', u'CVE-2012-6708', u'Medium'), (u'jquery', u'1.2.1', u'CVE-2011-4969', u'Medium'), (u'script_security', u'1.73', u'CVE-2020-2279', u'Critical'), (u'spring_framework', u'2.5.6', u'CVE-2011-2730', u'High'), (u'spring_framework', u'2.5.6', u'CVE-2010-1622', u'Medium'), (u'xstream', u'1.4', u'CVE-2017-7957', u'High'), (u'xstream', u'1.4', u'CVE-2016-3674', u'High'), (u'xstream', u'1.4', u'CVE-2013-7285', u'Critical'), (u'commons_beanutils', u'1.9.3', u'CVE-2019-10086', u'High'), (u'groovy', u'1.26', u'CVE-2019-1003006', u'High'), (u'groovy', u'1.26', u'CVE-2019-1003033', u'High') (u'crypto', u'1.5', u'CVE-2011-0766', u'High')

Environment:

wagoodman commented 3 years ago

@pdevine-cb I took a closer look at the problem and it looks like that it isn't the SBOM discovery process [in syft] that's causing the problem but really the vulnerability matching process [in grype] that can be enhanced to catch more of these cases.

The SBOM information reported in syft for the examples you listed above are derived from java pom.xml files. For package name, the field used is artifactId. Here is the pom.xml for jquery-detached:

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
    <modelVersion>4.0.0</modelVersion>

    <parent>
        <groupId>org.jenkins-ci.ui</groupId>
        <artifactId>js-module-base</artifactId>
        <version>1.2.1</version>
        <relativePath>../js-module-base</relativePath>
    </parent>
    <groupId>org.jenkins-ci.ui</groupId>
    <artifactId>jquery-detached</artifactId>
    <version>1.2.1</version>
    <packaging>hpi</packaging>
...

I took a closer look at the remaining available fields for jquery-detached in the pom.xml , there doesn't seem to be a good replacement field or process to generalize jquery-detached to jquery. This appears to be the case for the other packages you referenced --artifactId in the pom.xml seems to be the right.

Relative to anchore-engine, this also appears to be the case. That is, I'm seeing that engine sees this artifact as jquery-detached (from anchore-cli image content jenkins/jenkins:2.249.2-lts-jdk11 java):

...
        {
            "implementation-version": "N/A",
            "location": "/usr/share/jenkins/jenkins.war:WEB-INF/lib/jquery-detached-1.2.1-core-assets.jar",
            "maven-version": "1.2.1",
            "origin": "org.jenkins-ci.ui",
            "package": "jquery-detached",
            "specification-version": "N/A",
            "type": "JAVA-JAR"
        },
...

Regarding the other examples you listed (spring-*, script-security, groovy-all, etc...) there is similar evidence; the SBOM information appears to be accurate, both relative to the data found within the image as well as what anchore-engine reports for java content.

So! This all points to an issue upstream in grype where we have an opportunity to enhance the existing matching process (I'll transfer the issue to the grype repo). I think we can explore improve matching in at least a couple of ways:

  1. Manually swap out common characters during comparison, like dashes for underscores (relatively easy, but probably will have diminishing returns).
  2. Add in a fuzzy search capability into the package matching process (relatively harder, but will more easily cover more cases than with approach 1).
wagde-orca commented 3 years ago

Hi guys any update on this? Thanx

wagoodman commented 3 years ago

@wagde-orca no movement code-wise. Recently we just moved CPE generation from grype to syft and we're starting to refine the CPE's generated (reduce the number of * fields, order more specific first, etc). I want to carve out some time to get to prototyping out the proposed changes to get a sense on how it will affect the matching results before committing to picking it up --just haven't had the time yet. I'll shout out when there is a branch to look at, Stay tuned!

wagoodman commented 1 year ago

This issue is pretty old, but I wanted to state the current state (grype 0.56.0):