anchore / syft

CLI tool and library for generating a Software Bill of Materials from container images and filesystems
Apache License 2.0
6.12k stars 563 forks source link

Ensure accurate java main artifact name retrieval for multi-JARs and refine fallback approach #3054

Closed dor-hayun closed 2 months ago

dor-hayun commented 2 months ago

Improving Accuracy of Package Name Retrieval in Java Archives

This section outlines enhancements to accurately retrieve the main package artifact names and their corresponding SHA1 values. It focuses on resolving issues where incorrect package names were being retrieved, especially for JAR files with multiple internal JARs. The improvements include:

spiffcs commented 2 months ago

Thanks for the contribution @dor-hayun!

I think for us to accept this we need to check on some of the downstream implications here and see if it changes any of our fixtures in vulnerability testing for the better.

Also, let me see about adding a test or two that can show the kind of behavior this prevents. Do you have an current cases you're running into that would serve as a good example?

dor-hayun commented 2 months ago

HI @spiffcs , Please try to run syft on the following public image: public.ecr.aws/docker/library/gradle@sha256:70da12adf27e83bcc125af9d2bc6f9432590e89c96609625aa688135b27e75fb

and then you can check what happens for 'jansi' package, it is the main artifact, you can see that there are 10 pom properties found inside the jar and my fix is to ensure Correctly retrieving the main package name when the main POM file is present.

image
kzantow commented 2 months ago

Hey, @dor-hayun. It looks like I made a change that conflicts with this one a bit. Would you want to rebase this PR? Or I could push an update, if you don't mind.

dor-hayun commented 2 months ago

@kzantow i'm rebasing it