anchore / syft

CLI tool and library for generating a Software Bill of Materials from container images and filesystems
Apache License 2.0
5.73k stars 526 forks source link

Issue scanning Poetry Project with Syft 1.6 and cataloger=python-package-cataloger #2954

Closed mymichu closed 2 weeks ago

mymichu commented 2 weeks ago

What happened: We have updated Syft from Version 1.5.0 to 1.6.0 and discovered that it has issues scanning certain poetry.lock files. We try to scan a poetry project with the syft 1.6.0 with executing the following command:

syft --output cyclonedx-json=reports/sbom-scan-licenses.cdx.json --source-name=blub scan .

and then we discovered that the output of the console has the following warning:

[0001]  WARN cataloger failed cataloger=python-package-cataloger error=unable to parse poetry.lock: (0, 0): 
version = ">=1.0,<3"
markers = "platform_system == \"Windows\""
version = ">=1.6,<3"
]([]*toml.Tree) to trees location=/poetry.lock

and the json looks like followed:

{
  "$schema": "http://cyclonedx.org/schema/bom-1.5.schema.json",
  "bomFormat": "CycloneDX",
  "specVersion": "1.5",
  "serialNumber": "urn:uuid:dac4442f-fb9e-4168-b204-567579c0314e",
  "version": 1,
  "metadata": {
    "timestamp": "2024-06-13T15:31:24+02:00",
    "tools": {
      "components": [
        {
          "type": "application",
          "author": "anchore",
          "name": "syft",
          "version": "1.6.0"
        }
      ]
    },
    "component": {
      "bom-ref": "af63bd4c8601b7f1",
      "type": "file",
      "name": "blub"
    }
  }
}

With Syft version 1.5.0, it was possible to scan the lock file without any issues. If we remove the following part (see below) from the lock file then Syft 1.6.0 works. We can not do that because other teams are maintaining the lock files.

[package.dependencies]
msal = ">=0.4.1,<2.0.0"
packaging = "*"
portalocker = [
    {version = ">=1.0,<3", markers = "platform_system != \"Windows\""},
    {version = ">=1.6,<3", markers = "platform_system == \"Windows\""},
]

What you expected to happen:

I would expect that syft 1.6.0 has the same behaviour as the syft 1.5.0.

Steps to reproduce the issue:

Create poetry lock file and add the following snippet:

[package.dependencies]
msal = ">=0.4.1,<2.0.0"
packaging = "*"
portalocker = [
    {version = ">=1.0,<3", markers = "platform_system != \"Windows\""},
    {version = ">=1.6,<3", markers = "platform_system == \"Windows\""},
]

and execute the following command within the poetry project:

syft --output cyclonedx-json=reports/sbom-scan-licenses.cdx.json --source-name=blub scan .

Anything else we need to know?:

Currently nothing

Environment:

spiffcs commented 2 weeks ago

Thanks for the detailed write up @mymichu - let me take a look today and see if we can get a patch released for this bug

spiffcs commented 2 weeks ago

Looks like our poetryPackage needs a field to be able to track when pacakge.dependencies is not x = "version"

and instead

x  = [
 version, markers
]

Let me get see what a fix looks like for this and get that pushed.

Are there any other fields we're missing that can be two different types that we need to consider? I'm having trouble tracking down an exact specification for the lockfile: https://github.com/anchore/syft/blob/ca0cc52d47b103642c4b8516cf7a09a7a2671656/syft/pkg/cataloger/python/parse_poetry_lock.go#L29-L38

joshuatz commented 2 weeks ago

@spiffcs Just FYI - not trying to take over the thread, but I think #2947 (which I filed) overlaps with this

spiffcs commented 2 weeks ago

The fix for this might take more than a day - I'm currently evaluating a new toml parser for syft. The current one doesn't give us the hooks we're looking for to customize this unmarshal function the way we want.

https://github.com/anchore/syft/blob/273e31e8061c94742638b25ed74762dd5e74e783/syft/pkg/cataloger/python/parse_poetry_lock.go#L96-L102

Here is where the error originates from when we call tree.Unmarshal.

I tried initially to change the dependency type to any and doing a custom unmarshal given that it can be one of two things:

dependency = "version_constraint" ---> map[string]string
dependency = [
    {},
    {},
] ------> map[string][]ComplexVersion

I tried a couple solutions to get a custom Unmarshal function hooked into the poetryPackages type but found myself fighting with the library we use too much. It also looks like it removed support for this in its latest version: https://github.com/pelletier/go-toml?tab=readme-ov-file#support-for-tomlunmarshaler-has-been-dropped

Currently I'm trying to get things working again with: https://github.com/BurntSushi/toml

This gives us a lot more flexibility where we can define a custom UnMarshalTOML function for our poetry lock file type to get the inital decode done correctly: https://godocs.io/github.com/BurntSushi/toml#example-package-UnmarshalTOML

mymichu commented 2 weeks ago

@spiffcs Great work. Thank you for your quick response, support, and help. We appreciate this.