anchore / syft

CLI tool and library for generating a Software Bill of Materials from container images and filesystems
Apache License 2.0
5.78k stars 531 forks source link

Invalid PURLs when cataloging instrumentation classes/jars #2596

Open cdupuis opened 5 months ago

cdupuis commented 5 months ago

What happened:

Syft has started to hard-code some groupIds for maven artefacts which leads to misleading PURLs when related artefacts are used as embedded instrumentation JARs.

Ultimately this yields many false positives, eg with Grype.

What you expected to happen:

Preferably the original groupId would be preserved.

Steps to reproduce the issue:

Use the following Dockerfile to build an image:


FROM debian as newrelic

RUN apt-get update && apt-get install curl -y

RUN curl https://download.newrelic.com/newrelic/java-agent/newrelic-agent/8.8.0/newrelic-agent-8.8.0.jar -o /newrelic-agent-8.8.0.jar

FROM scratch

COPY --from=newrelic /newrelic-agent-8.8.0.jar /newrelic-agent-8.8.0.jar

then run syft <your image from above Dockerfile> -o sbom.json and see the following for eg the embedded spring.jar:

{
            "id": "355b62d06b0f65b9",
            "name": "spring",
            "version": "3.0.0-1.0",
            "type": "java-archive",
            "foundBy": "java-archive-cataloger",
            "locations": [
                {
                    "path": "/newrelic-agent-8.8.0.jar",
                    "layerID": "sha256:5fbb9490be31398af2f460dc46e26acf04f4be8aafbd2470d245efe06c1b59b5",
                    "accessPath": "/newrelic-agent-8.8.0.jar:instrumentation/spring-3.0.0-1.0.jar",
                    "annotations": {
                        "evidence": "primary"
                    }
                }
            ],
            "licenses": [],
            "language": "java",
            "cpes": [
                "cpe:2.3:a:springframework:spring:3.0.0-1.0:*:*:*:*:*:*:*",
                "cpe:2.3:a:new-relic:spring:3.0.0-1.0:*:*:*:*:*:*:*",
                "cpe:2.3:a:new_relic:spring:3.0.0-1.0:*:*:*:*:*:*:*",
                "cpe:2.3:a:spring:spring:3.0.0-1.0:*:*:*:*:*:*:*"
            ],
            "purl": "pkg:maven/org.springframework/spring@3.0.0-1.0",
            "metadataType": "java-archive",
            "metadata": {
                "virtualPath": "/newrelic-agent-8.8.0.jar:instrumentation/spring-3.0.0-1.0.jar",
                "manifest": {
                    "main": {
                        "Class-Required-Annotations": "org.springframework.stereotype.Controller",
                        "Illegal-Classes": "org/springframework/context/event/EventListener,org/springframework/web/bind/annotation/RestController",
                        "Implementation-Title": "com.newrelic.instrumentation.spring-3.0.0",
                        "Implementation-Title-Alias": "spring_annotations",
                        "Implementation-Vendor": "New Relic",
                        "Implementation-Version": "1.0",
                        "Manifest-Version": "1.0",
                        "Method-Required-Annotations": "org.springframework.web.bind.annotation.RequestMapping",
                        "Reference-Classes": "org/springframework/web/bind/annotation/RequestMapping,org/springframework/web/bind/annotation/RequestMethod",
                        "Weave-Classes": "",
                        "Weave-Methods": ""
                    }
                },
                "digest": [
                    {
                        "algorithm": "sha1",
                        "value": "4a05ae952c2fefd7c5b44cc797fa0075368bad7c"
                    }
                ]
            }
        }

When running Grype on this image, you get a lot of false positives, eg:

{
   "vulnerability": {
    "id": "GHSA-vpr3-f594-mg5g",
    "dataSource": "https://github.com/advisories/GHSA-vpr3-f594-mg5g",
    "namespace": "github:language:java",
    "severity": "Medium",
    "urls": [
     "https://github.com/advisories/GHSA-vpr3-f594-mg5g"
    ],
    "description": "Improper Control of Generation of Code ('Code Injection') in Spring Framework",
    "cvss": [],
    "fix": {
     "versions": [
      "3.0.3"
     ],
     "state": "fixed"
    },
    "advisories": []
   },
   "relatedVulnerabilities": [
    {
     "id": "CVE-2010-1622",
     "dataSource": "https://nvd.nist.gov/vuln/detail/CVE-2010-1622",
     "namespace": "nvd:cpe",
     "severity": "Medium",
     "urls": [
      "http://geronimo.apache.org/2010/07/21/apache-geronimo-v216-released.html",
      "http://geronimo.apache.org/21x-security-report.html",
      "http://geronimo.apache.org/22x-security-report.html",
      "http://secunia.com/advisories/41016",
      "http://secunia.com/advisories/41025",
      "http://secunia.com/advisories/43087",
      "http://www.exploit-db.com/exploits/13918",
      "http://www.oracle.com/technetwork/topics/security/cpuoct2015-2367953.html",
      "http://www.redhat.com/support/errata/RHSA-2011-0175.html",
      "http://www.securityfocus.com/archive/1/511877",
      "http://www.securityfocus.com/bid/40954",
      "http://www.securitytracker.com/id/1033898",
      "http://www.springsource.com/security/cve-2010-1622",
      "http://www.vupen.com/english/advisories/2011/0237"
     ],
     "description": "SpringSource Spring Framework 2.5.x before 2.5.6.SEC02, 2.5.7 before 2.5.7.SR01, and 3.0.x before 3.0.3 allows remote attackers to execute arbitrary code via an HTTP request containing class.classLoader.URLs[0]=jar: followed by a URL of a crafted .jar file.",
     "cvss": [
      {
       "source": "nvd@nist.gov",
       "type": "Primary",
       "version": "2.0",
       "vector": "AV:N/AC:M/Au:S/C:P/I:P/A:P",
       "metrics": {
        "baseScore": 6,
        "exploitabilityScore": 6.8,
        "impactScore": 6.4
       },
       "vendorMetadata": {}
      }
     ]
    }
   ],
   "matchDetails": [
    {
     "type": "exact-direct-match",
     "matcher": "java-matcher",
     "searchedBy": {
      "language": "java",
      "namespace": "github:language:java",
      "package": {
       "name": "spring",
       "version": "3.0.0-1.0"
      }
     },
     "found": {
      "versionConstraint": ">=3.0.0,<=3.0.2 (unknown)",
      "vulnerabilityID": "GHSA-vpr3-f594-mg5g"
     }
    }
   ],
   "artifact": {
    "id": "355b62d06b0f65b9",
    "name": "spring",
    "version": "3.0.0-1.0",
    "type": "java-archive",
    "locations": [
     {
      "path": "/newrelic-agent-8.8.0.jar",
      "layerID": "sha256:5fbb9490be31398af2f460dc46e26acf04f4be8aafbd2470d245efe06c1b59b5"
     }
    ],
    "language": "java",
    "licenses": [],
    "cpes": [
     "cpe:2.3:a:springframework:spring:3.0.0-1.0:*:*:*:*:*:*:*",
     "cpe:2.3:a:new-relic:spring:3.0.0-1.0:*:*:*:*:*:*:*",
     "cpe:2.3:a:new_relic:spring:3.0.0-1.0:*:*:*:*:*:*:*",
     "cpe:2.3:a:spring:spring:3.0.0-1.0:*:*:*:*:*:*:*"
    ],
    "purl": "pkg:maven/org.springframework/spring@3.0.0-1.0",
    "upstreams": [],
    "metadataType": "JavaMetadata",
    "metadata": {
     "virtualPath": "/newrelic-agent-8.8.0.jar:instrumentation/spring-3.0.0-1.0.jar",
     "pomArtifactID": "",
     "pomGroupID": "",
     "manifestName": "",
     "archiveDigests": [
      {
       "algorithm": "sha1",
       "value": "4a05ae952c2fefd7c5b44cc797fa0075368bad7c"
      }
     ]
    }
   }
  }

Anything else we need to know?:

Environment:

wagoodman commented 5 months ago

It seems like we might need to use the manifest before the hard-coded lookup here https://github.com/anchore/syft/blob/397cf210de0676dfed030caf8100a01167802753/syft/pkg/cataloger/java/package_url.go#L44-L50

I'm not certain if that would solve this problem, since we'd need to check if we would have found a better groupID from the manifest first.

willmurphyscode commented 5 months ago

I did a little investigation here. Right now Syft uses the following methods to try to infer the group ID of a JAR:

  1. Check the pom properties if present
  2. Check the pom project if present
  3. Check a hard-coded map of artifact IDs to group IDs (the cause of this issue)
  4. Check the manifest

I made a local build of Syft that switches 3 and 4, so that the manifest is checked before the hard-coded map, and it fixes this particular issue:

$ go run cmd/syft/main.go -q /tmp/syft2596/newrelic-agent-8.8.0.jar -o json | jq '.artifacts[] | { name: .name, purl: .purl }' -c | grep spring
{"name":"spring","purl":"pkg:maven/com.newrelic.instrumentation.spring-3.0.0/spring@3.0.0-1.0"}
{"name":"spring","purl":"pkg:maven/com.newrelic.instrumentation.spring-4.0.0/spring@4.0.0-1.0"}
{"name":"spring","purl":"pkg:maven/com.newrelic.instrumentation.spring-4.2.0/spring@4.2.0-1.0"}
{"name":"spring","purl":"pkg:maven/com.newrelic.instrumentation.spring-4.3.0/spring@4.3.0-1.0"}

which definitely seems better (but maybe shouldn't have the -3.0.0 in the group ID?). However, this breaks a lot of our tests - the hard coded map of artifact ID to group ID was added because there are a lot of common JARs for which we can't find a group ID by any of the other methods.

We might need to do something more complex than the fallback logic described above. For example, when we try to get the group ID from the manifest, we check a number of fields:

https://github.com/anchore/syft/blob/6107e5e2ad6e60f3f74ff4d3b2ca2ccffbed26ed/syft/pkg/cataloger/internal/cpegenerate/java.go#L25-L38

It might be possible that the hard-coded map should take precedence over the less common of those fields, but not the more common of those fields.

@cdupuis - do you know where in these JARs the build process is putting the "original group ID" you wished we preserved? Maybe a particular manifest key that's always set? That might help us come up with an implementation that fixes this without breaking the code paths that rely on the map.

willmurphyscode commented 2 months ago

2796 should improve the situation somewhat.