Open cdupuis opened 9 months ago
It seems like we might need to use the manifest before the hard-coded lookup here https://github.com/anchore/syft/blob/397cf210de0676dfed030caf8100a01167802753/syft/pkg/cataloger/java/package_url.go#L44-L50
I'm not certain if that would solve this problem, since we'd need to check if we would have found a better groupID from the manifest first.
I did a little investigation here. Right now Syft uses the following methods to try to infer the group ID of a JAR:
I made a local build of Syft that switches 3 and 4, so that the manifest is checked before the hard-coded map, and it fixes this particular issue:
$ go run cmd/syft/main.go -q /tmp/syft2596/newrelic-agent-8.8.0.jar -o json | jq '.artifacts[] | { name: .name, purl: .purl }' -c | grep spring
{"name":"spring","purl":"pkg:maven/com.newrelic.instrumentation.spring-3.0.0/spring@3.0.0-1.0"}
{"name":"spring","purl":"pkg:maven/com.newrelic.instrumentation.spring-4.0.0/spring@4.0.0-1.0"}
{"name":"spring","purl":"pkg:maven/com.newrelic.instrumentation.spring-4.2.0/spring@4.2.0-1.0"}
{"name":"spring","purl":"pkg:maven/com.newrelic.instrumentation.spring-4.3.0/spring@4.3.0-1.0"}
which definitely seems better (but maybe shouldn't have the -3.0.0
in the group ID?). However, this breaks a lot of our tests - the hard coded map of artifact ID to group ID was added because there are a lot of common JARs for which we can't find a group ID by any of the other methods.
We might need to do something more complex than the fallback logic described above. For example, when we try to get the group ID from the manifest, we check a number of fields:
It might be possible that the hard-coded map should take precedence over the less common of those fields, but not the more common of those fields.
@cdupuis - do you know where in these JARs the build process is putting the "original group ID" you wished we preserved? Maybe a particular manifest key that's always set? That might help us come up with an implementation that fixes this without breaking the code paths that rely on the map.
Hey, @cdupuis, have you been able to test a more recent version of Syft for this? I believe the New Relic handling should be improved.
What happened:
Syft has started to hard-code some groupIds for maven artefacts which leads to misleading PURLs when related artefacts are used as embedded instrumentation JARs.
Ultimately this yields many false positives, eg with Grype.
What you expected to happen:
Preferably the original groupId would be preserved.
Steps to reproduce the issue:
Use the following
Dockerfile
to build an image:then run
syft <your image from above Dockerfile> -o sbom.json
and see the following for eg the embeddedspring.jar
:When running Grype on this image, you get a lot of false positives, eg:
Anything else we need to know?:
Environment:
syft version
:cat /etc/os-release
or similar):