anchore / syft

CLI tool and library for generating a Software Bill of Materials from container images and filesystems
Apache License 2.0
6.25k stars 574 forks source link

Unable to extract licenses for some NPM packages #2611

Open atl-mk opened 9 months ago

atl-mk commented 9 months ago

What happened: I ran Syft with SYFT_JAVASCRIPT_SEARCH_REMOTE_LICENSES=true and it logs out a warning it failed to fetch them

What you expected to happen: To successfully fetch all licenses

Steps to reproduce the issue:

  1. Make a project with the dependencies below
  2. Run Syft with SYFT_JAVASCRIPT_SEARCH_REMOTE_LICENSES=true

Anything else we need to know?:

They all look like they have normal package.json files on GitHub to me which is strange

Environment:

tgerla commented 9 months ago

Hi @atl-mk, thanks for the report! I tried quickly to reproduce on the same version of Syft:

mkdir syft-2611 && cd syft-2611
yarn && yarn add array-slice
SYFT_JAVASCRIPT_SEARCH_REMOTE_LICENSES=true syft . -o json

I don't see the warning you are seeing, and I see the MIT license in the JSON output:

      "licenses": [
        {
          "value": "MIT",
          "spdxExpression": "MIT",
          "type": "declared",
          "urls": [],
...

Can you share more detailed reproduction steps, maybe the full project you are scanning? Can you also try upgrading to the latest available Syft?

Thanks!

atl-mk commented 9 months ago

@tgerla

Exactly the same even in a different project

Here's a simple package.json file

{
  "name": "test",
  "private": true,
  "dependencies": {
    "ansi-wrap": "0.1.0",
    "array-slice": "0.1.0",
    "glob-base": "0.3.0",
    "is-primitive": "2.0.0",
    "is-whitespace": "0.3.0",
    "kind-of": "1.1.0",
    "preserve": "0.2.0"
  }
}

Simply using yarn on 1.22.19 results in the same output. I even upgraded to the latest version of Syft, when I reported the bug 1.103.1 was the latest version, but the bug is still present

$ SYFT_JAVASCRIPT_SEARCH_REMOTE_LICENSES=true SYFT_LOG_LEVEL=info syft . -o syft-json=sbom.cyclonedx.json
[0000]  INFO syft version: 0.103.1
[0000]  WARN no explicit name and version provided for directory source, deriving artifact ID from the given path (which is not ideal)
[0000]  WARN unable to extract licenses from javascript yarn.lock for package ansi-wrap:0.1.0: unable to parse license from npm registry: json: cannot unmarshal object into Go struct field .license of type string
[0001]  WARN unable to extract licenses from javascript yarn.lock for package example:8.8.8: unable to parse license from npm registry: json: cannot unmarshal string into Go value of type struct { License string "json:\"license\"" }
[0001]  WARN unable to extract licenses from javascript yarn.lock for package glob-base:0.3.0: unable to parse license from npm registry: json: cannot unmarshal object into Go struct field .license of type string
[0002]  WARN unable to extract licenses from javascript yarn.lock for package is-primitive:2.0.0: unable to parse license from npm registry: json: cannot unmarshal object into Go struct field .license of type string
[0002]  WARN unable to extract licenses from javascript yarn.lock for package is-whitespace:0.3.0: unable to parse license from npm registry: json: cannot unmarshal object into Go struct field .license of type string
[0002]  WARN unable to extract licenses from javascript yarn.lock for package kind-of:1.1.0: unable to parse license from npm registry: json: cannot unmarshal object into Go struct field .license of type string
[0002]  WARN unable to extract licenses from javascript yarn.lock for package preserve:0.2.0: unable to parse license from npm registry: json: cannot unmarshal object into Go struct field .license of type string
$ SYFT_JAVASCRIPT_SEARCH_REMOTE_LICENSES=true SYFT_LOG_LEVEL=info syft . -o syft-json=sbom.cyclonedx.json
[0000]  INFO syft version: 0.105.0
[0000]  WARN no explicit name and version provided for directory source, deriving artifact ID from the given path (which is not ideal)
[0000]  WARN unable to extract licenses from javascript yarn.lock for package ansi-wrap:0.1.0: unable to parse license from npm registry: json: cannot unmarshal object into Go struct field .license of type string
[0001]  WARN unable to extract licenses from javascript yarn.lock for package example:8.8.8: unable to parse license from npm registry: json: cannot unmarshal string into Go value of type struct { License string "json:\"license\"" }
[0001]  WARN unable to extract licenses from javascript yarn.lock for package glob-base:0.3.0: unable to parse license from npm registry: json: cannot unmarshal object into Go struct field .license of type string
[0002]  WARN unable to extract licenses from javascript yarn.lock for package is-primitive:2.0.0: unable to parse license from npm registry: json: cannot unmarshal object into Go struct field .license of type string
[0002]  WARN unable to extract licenses from javascript yarn.lock for package is-whitespace:0.3.0: unable to parse license from npm registry: json: cannot unmarshal object into Go struct field .license of type string
[0002]  WARN unable to extract licenses from javascript yarn.lock for package kind-of:1.1.0: unable to parse license from npm registry: json: cannot unmarshal object into Go struct field .license of type string
[0002]  WARN unable to extract licenses from javascript yarn.lock for package preserve:0.2.0: unable to parse license from npm registry: json: cannot unmarshal object into Go struct field .license of type string
$ yarn -v
1.22.19

The yarn.lock file is simple too

# THIS IS AN AUTOGENERATED FILE. DO NOT EDIT THIS FILE DIRECTLY.
# yarn lockfile v1

ansi-wrap@0.1.0:
  version "0.1.0"
  resolved "https://packages.atlassian.com/api/npm/npm-remote/ansi-wrap/-/ansi-wrap-0.1.0.tgz#a82250ddb0015e9a27ca82e82ea603bbfa45efaf"
  integrity sha512-ZyznvL8k/FZeQHr2T6LzcJ/+vBApDnMNZvfVFy3At0knswWd6rJ3/0Hhmpu8oqa6C92npmozs890sX9Dl6q+Qw==

array-slice@0.1.0:
  version "0.1.0"
  resolved "https://packages.atlassian.com/api/npm/npm-remote/array-slice/-/array-slice-0.1.0.tgz#12adfc0238fc6a29e6ab5a4b7789c6ce7b723dc6"
  integrity sha512-hC286ytySez3XJWkjsBjugydgPZJXiHvwZNegJUIs+Xs5Ovslm7UfAlijFjYq7rJP4aUGdCF9FfWy7lPd1m4/A==

glob-base@0.3.0:
  version "0.3.0"
  resolved "https://packages.atlassian.com/api/npm/npm-remote/glob-base/-/glob-base-0.3.0.tgz#dbb164f6221b1c0b1ccf82aea328b497df0ea3c4"
  integrity sha512-ab1S1g1EbO7YzauaJLkgLp7DZVAqj9M/dvKlTt8DkXA2tiOIcSMrlVI2J1RZyB5iJVccEscjGn+kpOG9788MHA==
  dependencies:
    glob-parent "^2.0.0"
    is-glob "^2.0.0"

glob-parent@^2.0.0:
  version "2.0.0"
  resolved "https://packages.atlassian.com/api/npm/npm-remote/glob-parent/-/glob-parent-2.0.0.tgz#81383d72db054fcccf5336daa902f182f6edbb28"
  integrity sha512-JDYOvfxio/t42HKdxkAYaCiBN7oYiuxykOxKxdaUW5Qn0zaYN3gRQWolrwdnf0shM9/EP0ebuuTmyoXNr1cC5w==
  dependencies:
    is-glob "^2.0.0"

is-extglob@^1.0.0:
  version "1.0.0"
  resolved "https://packages.atlassian.com/api/npm/npm-remote/is-extglob/-/is-extglob-1.0.0.tgz#ac468177c4943405a092fc8f29760c6ffc6206c0"
  integrity sha512-7Q+VbVafe6x2T+Tu6NcOf6sRklazEPmBoB3IWk3WdGZM2iGUwU/Oe3Wtq5lSEkDTTlpp8yx+5t4pzO/i9Ty1ww==

is-glob@^2.0.0:
  version "2.0.1"
  resolved "https://packages.atlassian.com/api/npm/npm-remote/is-glob/-/is-glob-2.0.1.tgz#d096f926a3ded5600f3fdfd91198cb0888c2d863"
  integrity sha512-a1dBeB19NXsf/E0+FHqkagizel/LQw2DjSQpvQrj3zT+jYPpaUCryPnrQajXKFLCMuf4I6FhRpaGtw4lPrG6Eg==
  dependencies:
    is-extglob "^1.0.0"

is-primitive@2.0.0:
  version "2.0.0"
  resolved "https://packages.atlassian.com/api/npm/npm-remote/is-primitive/-/is-primitive-2.0.0.tgz#207bab91638499c07b2adf240a41a87210034575"
  integrity sha512-N3w1tFaRfk3UrPfqeRyD+GYDASU3W5VinKhlORy8EWVf/sIdDL9GAcew85XmktCfH+ngG7SRXEVDoO18WMdB/Q==

is-whitespace@0.3.0:
  version "0.3.0"
  resolved "https://packages.atlassian.com/api/npm/npm-remote/is-whitespace/-/is-whitespace-0.3.0.tgz#1639ecb1be036aec69a54cbb401cfbed7114ab7f"
  integrity sha512-RydPhl4S6JwAyj0JJjshWJEFG6hNye3pZFBRZaTUfZFwGHxzppNaNOVgQuS/E/SlhrApuMXrpnK1EEIXfdo3Dg==

kind-of@1.1.0:
  version "1.1.0"
  resolved "https://packages.atlassian.com/api/npm/npm-remote/kind-of/-/kind-of-1.1.0.tgz#140a3d2d41a36d2efcfa9377b62c24f8495a5c44"
  integrity sha512-aUH6ElPnMGon2/YkxRIigV32MOpTVcoXQ1Oo8aYn40s+sJ3j+0gFZsT8HKDcxNy7Fi9zuquWtGaGAahOdv5p/g==

preserve@0.2.0:
  version "0.2.0"
  resolved "https://packages.atlassian.com/api/npm/npm-remote/preserve/-/preserve-0.2.0.tgz#815ed1f6ebc65926f865b310c0713bcb3315ce4b"
  integrity sha512-s/46sYeylUfHNjI+sA/78FAHlmIuKqI9wNnzEOGehAlUUYeObv5C2mOinXBjyUyWmJ2SfcS2/ydApH4hTF4WXQ==
willmurphyscode commented 8 months ago

Hi @atl-mk, thanks for the detailed info! I've been able to reproduce the issue and have an idea for the fix, and will add this to our backlog. Details below:

It looks like the NPM registry doesn't always return a license shaped the way we expect. In Syft's code, we assume that the license field on the returned object will be a string, but it looks like sometimes it can be an object:

❯ curl -s https://registry.npmjs.org/tiny-tarball/1.0.0 | jq .license
"ISC"

❯ curl -s https://registry.npmjs.org/ansi-wrap/0.1.0 | jq .license
{
  "type": "MIT",
  "url": "https://github.com/jonschlinkert/ansi-wrap/blob/master/LICENSE"
}

So for ansi-wrap, we get an object back, and for tiny-tarball, we get a single string.

But in Syft, we assume it will be a single string, see https://github.com/anchore/syft/blob/98de2e2f6205b1660f98915cbed22695821fa9c8/syft/pkg/cataloger/javascript/package.go#L186-L188, so this functionality is broken for packages that have an object in their license field.

Dev notes: The next step is to change our deserialization to work with either an object or a string being returned in the license field.

atl-mk commented 8 months ago

Thanks @willmurphyscode

Note it can also be an array of objects. See https://docs.npmjs.com/cli/v10/configuring-npm/package-json#license for an example, while this shape is deprecated by NPM, many packages still use this.