anchore / syft

CLI tool and library for generating a Software Bill of Materials from container images and filesystems
Apache License 2.0
5.79k stars 530 forks source link

Syft incorrectly identifying jruby jar files #2877

Open joshbressers opened 1 month ago

joshbressers commented 1 month ago

For the purposes of this report, I used this container docker.elastic.co/logstash/logstash:8.13.4

We are seeing java findings show up that are part of jruby packages. These findings are very broken as they're not really jar files, they are part of the jruby package.

For example

bress@anchore ➜ syft convert /tmp/logstash.json | grep nokogiri
[0000]  WARN convert is an experimental feature, run `syft convert -h` for help
nokogiri                                                                                   java-archive
nokogiri                                        1.16.4                                     gem

That turns up the nokogiri gem, and a nokogiri java-archive which isn't real.

The nokogiri gemspec can be found here

bress@anchore ➜  docker run --rm  docker.elastic.co/logstash/logstash:8.13.4 cat /usr/share/logstash/vendor/bundle/jruby/3.1.0/specifications/nokogiri-1.16.4-java.gemspec | head
# -*- encoding: utf-8 -*-
# stub: nokogiri 1.16.4 java liblib/nokogiri/jruby

Gem::Specification.new do |s|
  s.name = "nokogiri".freeze
  s.version = "1.16.4"
  s.platform = "java".freeze

  s.required_rubygems_version = Gem::Requirement.new(">= 0".freeze) if s.respond_to? :required_rubygems_version=
  s.metadata = { "bug_tracker_uri" => "https://github.com/sparklemotion/nokogiri/issues", "changelog_uri" => "https://nokogiri.org/CHANGELOG.html", "documentation_uri" => "https://nokogiri.org/rdoc/index.html", "homepage_uri" => "https://nokogiri.org", "rubygems_mfa_required" => "true", "source_code_uri" => "https://github.com/sparklemotion/nokogiri" } if s.respond_to? :metadata=

The nokogiri.jar is located at /usr/share/logstash/vendor/bundle/jruby/3.1.0/gems/nokogiri-1.16.4-java/lib/nokogiri/nokogiri.jar

Here are the details that are turned up for that jar file

    {
      "id": "f64e536bd221a6de",
      "name": "nokogiri",
      "version": "",
      "type": "java-archive",
      "foundBy": "java-archive-cataloger",
      "locations": [
        {
          "path": "/usr/share/logstash/vendor/bundle/jruby/3.1.0/gems/nokogiri-1.16.4-java/lib/nokogiri/nokogiri.jar",
          "layerID": "sha256:dd49a5e5f509de1b29ebdeaa3e11ea6251e6c741157c2bc0c3a9228827bc80c6",
          "accessPath": "/usr/share/logstash/vendor/bundle/jruby/3.1.0/gems/nokogiri-1.16.4-java/lib/nokogiri/nokogiri.jar",
          "annotations": {
            "evidence": "primary"
          }
        }
      ],
      "licenses": [],
      "language": "java",
      "cpes": [
        {
          "cpe": "cpe:2.3:a:nokogiri:nokogiri:*:*:*:*:*:*:*:*",
          "source": "syft-generated"
        }
      ],
      "purl": "pkg:maven/nokogiri/nokogiri",
      "metadataType": "java-archive",
      "metadata": {
        "virtualPath": "/usr/share/logstash/vendor/bundle/jruby/3.1.0/gems/nokogiri-1.16.4-java/lib/nokogiri/nokogiri.jar",
        "manifest": {
          "main": [
            {
              "key": "Manifest-Version",
              "value": "1.0"
            },
            {
              "key": "Created-By",
              "value": "11.0.20.1 (Ubuntu)"
            }
          ]
        },
        "digest": [
          {
            "algorithm": "sha1",
            "value": "bcc6ebb7ac131f150bdfa54cc66b8fd406695767"
          }
        ]
      }
    },

I wouldn't expect such a jar to show up in the results

luhring commented 1 month ago

Hey @joshbressers, I'm curious to learn more... I'm seeing an issue with scans of jruby as well, and I'm curious if my issue is related to this one. In my case I'm seeing inconsistent results for CPEs and license data.

You mention a couple times that the JAR files "aren't real", could you say more? What do you mean by that exactly? While these files functions as gems, I had been thinking that they are simultaneously valid Java archives (in Syft terms).