anchore / syft

CLI tool and library for generating a Software Bill of Materials from container images and filesystems
Apache License 2.0
6.15k stars 567 forks source link

SYFT command does not scan all JARs #1520

Open KratochvilLukas opened 1 year ago

KratochvilLukas commented 1 year ago

Hi.

We found that there are missing some dependencies in result of syft scan. We have more than 400 jars, which we need to be scanned. For most of them the command mentioned bellow works as expected, but there are 4 jars missing in output.

syft /tmp/jars --name "Example" --output cyclonedx-json --file /tmp/sbom-test-missing-graal.json

Missing jars:

graal-sdk-22.3.0.jar
regex-22.3.0.jar
js-scriptengine-22.3.0.jar
js-22.3.0.jar

Why is this happening? I think, it's a bug, but maybe it's expected for some reason?

Thank you.

How to reproduce

1- Download and unzip jars.zip 2- run syft /tmp/jars --name "Example" --output cyclonedx-json --file /tmp/sbom-test-missing-graal-dependencies.json 3- -> Command creates an empty result:

{
  "bomFormat": "CycloneDX",
  "specVersion": "1.4",
  "serialNumber": "urn:uuid:1e157f98-8bfe-4c96-92e1-4f22f19c1e24",
  "version": 1,
  "metadata": {
    "timestamp": "2023-01-26T13:35:16+01:00",
    "tools": [
      {
        "vendor": "anchore",
        "name": "syft",
        "version": "0.68.1"
      }
    ],
    "component": {
      "bom-ref": "f2707bac2aaceaf8",
      "type": "file",
      "name": "Example"
    }
  },
  "components": []
}

What you expected to happen: All jars will be scanned and included in cyclonedx-json result.

Environment: image

tgerla commented 1 year ago

Hi @KratochvilLukas, thanks for the issue, sorry it has taken us some time to reply. I believe I have reproduced the problem and hopefully I will have some information for you soon.

tgerla commented 1 year ago

Hi @KratochvilLukas, I took a closer look at your jar files and I think I found the reason: Syft is only able to detect Java packages if there is a pom.xml or a MANIFEST.MF inside the jar. I took a look at the 4 jars you shared, and it looks like they really just have class files, with no package metadata, therefore we aren't able to detect the packages.

Hope this makes sense--happy to chat further.

KratochvilLukas commented 1 year ago

Hi @tgerla .. Thank you for your response.

I thought it would be some kind of problem, but from our perspective it's dependencies as any other.

I prepared a demo of our solution. graal-demo.zip Scenario is - 1- gradlew build copyJars 2- jars folder is created inside build folder 3- scan all the jars using mentioned command.

I understand that these jars are not in "standard" format, but it would be nice to have at least something what we know about jar. (we have a name and version)

Ideally I'm expecting something like this: bom.zip (it's generated by cyclone-dx gradle plugin)

Do you have some recommendation what can I do to achieve this result using only syft scan?

Additional information: https://mvnrepository.com/artifact/org.graalvm.sdk/graal-sdk/22.3.0 https://mvnrepository.com/artifact/org.graalvm.js/js-scriptengine/22.3.0 https://mvnrepository.com/artifact/org.graalvm.js/js/22.3.0 https://mvnrepository.com/artifact/org.graalvm.regex/regex/22.3.0

spiffcs commented 1 year ago

@KratochvilLukas I've added the needs investigation into this since it will probably go into our greater effort for enhancing syft with external data sources when the content of the original source is not adequate in identifying the package

KratochvilLukas commented 1 year ago

@spiffcs Thank you.

If you will have any additional information, please let me know into this thread.

We'll try to figure out some workaround for the missing dependencies for now.

dermot-hardy commented 1 year ago

@tgerla: ...I found the reason: Syft is only able to detect Java packages if there is a pom.xml or a MANIFEST.MF inside the jar.

Since it appears that the META-INF/MANIFEST.MF is optional, is the solution here simply to assume that all ZIP-format files that have a .jar extension are Java archives? And to just output an entry with minimal information, like the file location and the SHA-1 digest?

tetzla commented 2 months ago

Hello,

I have encountered the same problem. A fix would be highly appreciated. Especially since the presence of a MANIFEST.MF isn't mandatory.

Other SBOM generators (cdxgen, Sonatype SBOM) are able to recognize JARs without MANIFEST.MF. I suppose they traverse the Maven dependency tree or they perform a lookup in artifact repositories (e.g. Maven Central).