anchore / syft

CLI tool and library for generating a Software Bill of Materials from container images and filesystems
Apache License 2.0
5.73k stars 526 forks source link

Nondeterministic SBOM generation #2967

Open luhring opened 1 week ago

luhring commented 1 week ago

What happened:

I'm seeing nondeterministic behavior when using Syft as a library (in wolfictl) to generate SBOMs. I noticed this via new golden-file style tests we've introduced, to ensure we get the same output for the same input. For a couple of the test targets (which are each APK files), a test will fail on the next run immediately following that test's golden file update.

I'm not 100% sure this is Syft's fault yet, since there's wrapping code in wolfictl involved, too. But wanted to flag the issue here at least so we can discuss!

Here are some example diffs from two consecutive runs of the SBOM generation code under this test:

For jenkins-2.461-r0.apk

 "language": "java",
 "cpes": [
   {
-    "cpe": "cpe:2.3:a:jenkins:mailer:472.vf7c289a_4b_420:*:*:*:*:jenkins:*:*",
-    "source": "nvd-cpe-dictionary"
-  },
-  {
-    "cpe": "cpe:2.3:a:jenkins:mailer:472.vf7c289a_4b_420:*:*:*:*:*:*:*",
+    "cpe": "cpe:2.3:a:jenkins:mailer:472.vf7c289a_4b_420:*:*:*:*:*:*:*",
+    "source": "nvd-cpe-dictionary"
+  },
+  {
+    "cpe": "cpe:2.3:a:jenkins:mailer:472.vf7c289a_4b_420:*:*:*:*:jenkins:*:*",
     "source": "nvd-cpe-dictionary"

For jruby-9.4-9.4.7.0-r0.apk:

 {
-"id": "b18c20e1cb65977c",
+"id": "1ac1bd89c6841000",
 "name": "jruby-base",
 "version": "9.4.7.0",
 "type": "java-archive",
 ...
     }
   }
 ],
-"licenses": [],
+"licenses": [
+  {
+    "value": "Apache-2.0",
+    "spdxExpression": "Apache-2.0",
+    "type": "concluded",
+    "urls": [],
+    "locations": [
+      {
+        "path": "usr/share/jruby/lib/jruby.jar",
+        "accessPath": "usr/share/jruby/lib/jruby.jar",
+        "annotations": {
+          "evidence": "primary"
+        }
+      }
+    ]
+  }
+],

What you expected to happen:

Same exact output given same input!

Steps to reproduce the issue:

Check out https://github.com/wolfi-dev/wolfictl and run the test linked above. Note that you may have to run the test multiple times in order to get a complete sense of the results that code can produce. Also note that the first run of the test is doing a fetch of several APKs, so it will take considerably more time than subsequent test runs.

Anything else we need to know?:

So far the only test cases exhibiting this behavior are Java-based packages... 🤔

cc: @wagoodman, this is the thing we talked about briefly last week.

Environment:

$ go list -m all | grep syft           
github.com/anchore/syft v1.7.0