Java cataloger reports various packages by the wrong name

What happened: When analyzing an image w/ various jar files, the name of the packages often does not match the expected name used in other utilities such as grype. Some examples:

jquery-detached-1.2.1-core-assets.jar becomes jquery-detached instead of jquery
script-security.hpi becomes script-security instead of script_security
spring-*.jar become spring-* instead of having a component named spring_framework
commons-beanutils-1.9.3.jar becomes commons-beanutils instead of commons_beanutils
groovy-all-2.4.12.jar becomes groovy-all instead of groovy
crypto-util-1.5.jar becomes crypto-util instead of crypto

This causes mis-matches with various vulnerability feeds which leads to false negatives inside other tools such as grype.

What you expected to happen: I would expect the detected names to either reflect the names expected in other tools. Alternatively, those other tools such as grype could do a better job of matching names, however I think that might be difficult to accomplish.

How to reproduce it (as minimally and precisely as possible): $ syft jenkins/jenkins:2.249.2-lts-jdk11

Anything else we need to know?: These CVEs were missed in jenkins/jenkins as a result of the names not matching: (u'jquery', u'1.2.1', u'CVE-2020-11023', u'Medium'), (u'jquery', u'1.2.1', u'CVE-2020-11022', u'Medium'), (u'jquery', u'1.2.1', u'CVE-2019-11358', u'Medium'), (u'jquery', u'1.2.1', u'CVE-2015-9251', u'Medium'), (u'jquery', u'1.2.1', u'CVE-2012-6708', u'Medium'), (u'jquery', u'1.2.1', u'CVE-2011-4969', u'Medium'), (u'script_security', u'1.73', u'CVE-2020-2279', u'Critical'), (u'spring_framework', u'2.5.6', u'CVE-2011-2730', u'High'), (u'spring_framework', u'2.5.6', u'CVE-2010-1622', u'Medium'), (u'xstream', u'1.4', u'CVE-2017-7957', u'High'), (u'xstream', u'1.4', u'CVE-2016-3674', u'High'), (u'xstream', u'1.4', u'CVE-2013-7285', u'Critical'), (u'commons_beanutils', u'1.9.3', u'CVE-2019-10086', u'High'), (u'groovy', u'1.26', u'CVE-2019-1003006', u'High'), (u'groovy', u'1.26', u'CVE-2019-1003033', u'High') (u'crypto', u'1.5', u'CVE-2011-0766', u'High')

Environment:

Output of syft version:

Application:   syft
Version:       v0.1.0-SNAPSHOT-1fc4629
BuildDate:     2020-10-09T23:49:13Z
GitCommit:     1fc46291a6e885cee57ddb2f00ec6c74c51a63a3
GitTreeState:  dirty
Platform:      linux/amd64
GoVersion:     go1.13.8
Compiler:      gc

OS (e.g: cat /etc/os-release or similar):

NAME="Ubuntu"
VERSION="20.04.1 LTS (Focal Fossa)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 20.04.1 LTS"
VERSION_ID="20.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=focal
UBUNTU_CODENAME=focal

@pdevine-cb I took a closer look at the problem and it looks like that it isn't the SBOM discovery process [in syft] that's causing the problem but really the vulnerability matching process [in grype] that can be enhanced to catch more of these cases.

The SBOM information reported in syft for the examples you listed above are derived from java pom.xml files. For package name, the field used is artifactId. Here is the pom.xml for jquery-detached:

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
    <modelVersion>4.0.0</modelVersion>

    <parent>
        <groupId>org.jenkins-ci.ui</groupId>
        <artifactId>js-module-base</artifactId>
        <version>1.2.1</version>
        <relativePath>../js-module-base</relativePath>
    </parent>
    <groupId>org.jenkins-ci.ui</groupId>
    <artifactId>jquery-detached</artifactId>
    <version>1.2.1</version>
    <packaging>hpi</packaging>
...

I took a closer look at the remaining available fields for jquery-detached in the pom.xml , there doesn't seem to be a good replacement field or process to generalize jquery-detached to jquery. This appears to be the case for the other packages you referenced --artifactId in the pom.xml seems to be the right.

Relative to anchore-engine, this also appears to be the case. That is, I'm seeing that engine sees this artifact as jquery-detached (from anchore-cli image content jenkins/jenkins:2.249.2-lts-jdk11 java):

...
        {
            "implementation-version": "N/A",
            "location": "/usr/share/jenkins/jenkins.war:WEB-INF/lib/jquery-detached-1.2.1-core-assets.jar",
            "maven-version": "1.2.1",
            "origin": "org.jenkins-ci.ui",
            "package": "jquery-detached",
            "specification-version": "N/A",
            "type": "JAVA-JAR"
        },
...

Regarding the other examples you listed (spring-*, script-security, groovy-all, etc...) there is similar evidence; the SBOM information appears to be accurate, both relative to the data found within the image as well as what anchore-engine reports for java content.

So! This all points to an issue upstream in grype where we have an opportunity to enhance the existing matching process (I'll transfer the issue to the grype repo). I think we can explore improve matching in at least a couple of ways:

Manually swap out common characters during comparison, like dashes for underscores (relatively easy, but probably will have diminishing returns).
Add in a fuzzy search capability into the package matching process (relatively harder, but will more easily cover more cases than with approach 1).

Hi guys any update on this? Thanx

@wagde-orca no movement code-wise. Recently we just moved CPE generation from grype to syft and we're starting to refine the CPE's generated (reduce the number of * fields, order more specific first, etc). I want to carve out some time to get to prototyping out the proposed changes to get a sense on how it will affect the matching results before committing to picking it up --just haven't had the time yet. I'll shout out when there is a branch to look at, Stay tuned!

This issue is pretty old, but I wanted to state the current state (grype 0.56.0):

All package names have not changed (the issue description is still accurate)
All FNs listed are still FNs EXCEPT for the following (these results now show up on grype scans):
- (u'script_security', u'1.73', u'https://github.com/advisories/GHSA-ccr8-4xr7-cgj3', u'Critical'),
- (u'spring_framework', u'2.5.6', u'https://github.com/advisories/GHSA-wv88-pf73-x22p', u'High'),
- (u'commons_beanutils', u'1.9.3', u'https://github.com/advisories/GHSA-6phf-73q6-gh87', u'High'),

anchore / grype

Java cataloger reports various packages by the wrong name #192