anchore / syft

CLI tool and library for generating a Software Bill of Materials from container images and filesystems
Apache License 2.0
5.98k stars 551 forks source link

syft stuck at 'Cataloged contents' #3068

Open jc776 opened 1 month ago

jc776 commented 1 month ago

What happened: I'm using syft to scan all directories on an EC2 instance for an SBOM. It hangs at this point, it's using 100% CPU, but never completes:

$ syft dir:/ --exclude ./opt/codedeploy-agent/deployment-root --exclude ./opt/wildfly-18.0.1.Final/standalone/tmp/vfs --exclude='**/*.spdx.json' -o spdx-json=sbom-full.spdx.json 
 ✔ Indexed file system                                                                         /
 ✔ Cataloged contents              8a5edab282632443219e051e4ade2d1d5bbc671c781051bf1437897cbdfea
   ├── ✔ Packages                        [5,227 packages]
   ├── ✔ File digests                    [47,411 files]
   ├── ✔ File metadata                   [47,462 locations]
   └── ✔ Executables                     [2,444 executables]

The output file remains empty:

-rw-r--r--  1 [removed] wildfly       0 Jul 24 16:48 sbom-full.spdx.json

Similar command completed OK and writes the sbom file in previous syft version 1.3.0 Similar command stuck without completing in previous syft versions 1.4.1 and 1.8.0, but I don't have saved output from those ones.

Running the same scan via grype does complete and does the vulnerability scan step correctly, but isn't set to output an sbom file.

$ grype dir:/ --exclude ./opt/codedeploy-agent/deployment-root --exclude ./opt/wildfly-18.0.1.Final/standalone/tmp/vfs --exclude='**/*.spdx.json' -o json=grype.json
 ✔ Indexed file system                                                                         /
 ✔ Vulnerability DB                [updated]
 ✔ Cataloged contents              8a5edab282632443219e051e4ade2d1d5bbc671c781051bf1437897cbdfea
   ├── ✔ Packages                        [5,227 packages]
   ├── ✔ File digests                    [47,411 files]
   ├── ✔ File metadata                   [47,462 locations]
   └── ✔ Executables                     [2,444 executables]
 ✔ Scanned for vulnerabilities     [x vulnerability matches]
   ├── by severity: x critical, x high, x medium, x low, 0 negligible (x unknown)
   └── by status:   x fixed, x not-fixed, x ignored
A newer version of grype is available for download: 0.79.3 (installed version is 0.79.2)

What you expected to happen: Command completes and writes sbom-full.spdx.json.

Steps to reproduce the issue: Not sure - using this command:

syft dir:/ --exclude ./opt/codedeploy-agent/deployment-root --exclude ./opt/wildfly-18.0.1.Final/standalone/tmp/vfs --exclude='**/*.spdx.json' -o spdx-json=sbom-full.spdx.json 

Anything else we need to know?:

Environment:

popey commented 1 month ago

Thanks for the issue @jc776

Just a couple of points.

Are you able to run syft in verbose mode? This will capture the output while it's running. Either syft -vv or syft -vvv will help.

You mentioned that older versions worked. To confirm this, if required, you should be able to grab any old binary version of syft to compare the run/output.

$ curl -sSfL https://raw.githubusercontent.com/anchore/syft/main/install.sh | sh -s -- -b . v1.3.0
[info] checking github for release tag='v1.3.0'
[info] fetching release script for tag='v1.3.0'
[info] using release tag='v1.3.0' version='1.3.0' os='linux' arch='amd64'
[info] installed ./syft
$ ./syft --version
syft 1.3.0
jc776 commented 1 month ago

syft -vv dir:/ --exclude ./opt/codedeploy-agent/deployment-root --exclude ./opt/wildfly-18.0.1.Final/standalone/tmp/vfs --exclude='**/*.spdx.json' -o spdx-json=sbom-full.spdx.json got to this step, repeating a message with 30-60sec gaps:

[0111] DEBUG file digests cataloger processed 47411 files
[0111] DEBUG file metadata cataloger processed 47462 files
[0114] DEBUG executable cataloger processed 2444 files
[0116] DEBUG invalid h1digest: : invalid h#digest:
[0116] DEBUG invalid h1digest: : invalid h#digest:
[0116] DEBUG invalid h1digest: : invalid h#digest:
[0116] DEBUG invalid h1digest: : invalid h#digest:
[0116] DEBUG invalid h1digest: : invalid h#digest:
[0116] DEBUG invalid h1digest: : invalid h#digest:
[0117] DEBUG invalid h1digest: : invalid h#digest:
[0117] DEBUG invalid h1digest: : invalid h#digest:
[0117] DEBUG invalid h1digest: : invalid h#digest:
[0117] DEBUG invalid h1digest: : invalid h#digest:

[0173] DEBUG invalid h1digest: : invalid h#digest:
[0173] DEBUG invalid h1digest: : invalid h#digest:
[0174] DEBUG invalid h1digest: : invalid h#digest:
[0174] DEBUG invalid h1digest: : invalid h#digest:
[0174] DEBUG invalid h1digest: : invalid h#digest:
[0174] DEBUG invalid h1digest: : invalid h#digest:
[0191] DEBUG invalid h1digest: : invalid h#digest:
[0191] DEBUG invalid h1digest: : invalid h#digest:
[0192] DEBUG invalid h1digest: : invalid h#digest:
[0244] DEBUG invalid h1digest: : invalid h#digest:
[0244] DEBUG invalid h1digest: : invalid h#digest:
[0244] DEBUG invalid h1digest: : invalid h#digest:
[0245] DEBUG invalid h1digest: : invalid h#digest:
[0245] DEBUG invalid h1digest: : invalid h#digest:
[0246] DEBUG invalid h1digest: : invalid h#digest:
[0246] DEBUG invalid h1digest: : invalid h#digest:
[0246] DEBUG invalid h1digest: : invalid h#digest:
[0246] DEBUG invalid h1digest: : invalid h#digest:
[0246] DEBUG invalid h1digest: : invalid h#digest:
[0246] DEBUG invalid h1digest: : invalid h#digest:
[0246] DEBUG invalid h1digest: : invalid h#digest:
[0249] DEBUG invalid h1digest: : invalid h#digest:
[0250] DEBUG invalid h1digest: : invalid h#digest:

syft -vvv dir:/ --exclude ./opt/codedeploy-agent/deployment-root --exclude ./opt/wildfly-18.0.1.Final/standalone/tmp/vfs --exclude='**/*.spdx.json' -o spdx-json=sbom-full.spdx.json did not give any more output between those messages. It stopped outputting messages after 5 1/2mins but was still at 100% CPU until manually stopped at 10mins.

[0326] DEBUG invalid h1digest: : invalid h#digest:
[0326] DEBUG invalid h1digest: : invalid h#digest:
[0326] DEBUG invalid h1digest: : invalid h#digest:
[0326] DEBUG invalid h1digest: : invalid h#digest:
[0327] DEBUG invalid h1digest: : invalid h#digest:
[0327] DEBUG invalid h1digest: : invalid h#digest:
[0327] DEBUG invalid h1digest: : invalid h#digest:
[0327] DEBUG invalid h1digest: : invalid h#digest:
[0328] DEBUG invalid h1digest: : invalid h#digest:
[0334] DEBUG invalid h1digest: : invalid h#digest:
[0334] DEBUG invalid h1digest: : invalid h#digest:
[0334] DEBUG invalid h1digest: : invalid h#digest:
[0334] DEBUG invalid h1digest: : invalid h#digest:
[0334] DEBUG invalid h1digest: : invalid h#digest:
[0335] DEBUG invalid h1digest: : invalid h#digest:
[0335] DEBUG invalid h1digest: : invalid h#digest:
[0335] DEBUG invalid h1digest: : invalid h#digest:
[0336] DEBUG invalid h1digest: : invalid h#digest:
[0336] DEBUG invalid h1digest: : invalid h#digest:
[0336] DEBUG invalid h1digest: : invalid h#digest:

^C[0583] TRACE signal interrupt, stop requested
[0583] TRACE signal interrupt component=eventloop

Process in top:

%CPU %MEM     TIME+ COMMAND
122.0  6.5  10:19.28 syft
popey commented 1 month ago

Thanks. I'm not surprised by the CPU usage, as it looks like we're arriving here where some (a lot) of digests are (attempting to be) calculated, and failing.

Does it produce sane output if you omit generating an sbom-full.spdx.json (i.e. omit -o completely) and just capture the standard output of syft? Just as a test?

jc776 commented 1 month ago

Syft 1.9 does complete successfully when producing standard output:

$ syft dir:/ --exclude ./opt/codedeploy-agent/deployment-root --exclude ./opt/wildfly-18.0.1.Final/standalone/tmp/vfs --exclude='**/*.spdx.json' | tee syft.log

NAME                                                           VERSION                                    TYPE                                 
                                                               0.2.1                                      gem                                   
./extra/aws-sdk-go                                             (devel)                                    go-module            (+5 duplicates)  
./extra/lockfile                                               (devel)                                    go-module            (+3 duplicates)  
8021q                                                          1.8                                        linux-kernel-module  (+2 duplicates)  
8139cp                                                         1.3                                        linux-kernel-module  (+2 duplicates)  
8139too                                                        0.9.28                                     linux-kernel-module  (+2 duplicates)  
8250_exar                                                                                                 linux-kernel-module  (+2 duplicates)  
842                                                                                                       linux-kernel-module  (+2 duplicates)  
842_compress                                                                                              linux-kernel-module  (+2 duplicates)  
842_decompress                                                                                            linux-kernel-module  (+2 duplicates)  
Babel                                                          0.9.6                                      python                                
BusLogic                                                                                                  linux-kernel-module  (+2 duplicates)  
Commons Daemon Service Manager                                 1.0.15.0                                   dotnet                                
Commons Daemon Service Runner                                  1.0.15.0                                   dotnet               (+1 duplicate)   
FastInfoset                                                    1.2.13                                     java-archive                          
GeoIP                                                          1.5.0-11.amzn2.0.2                         rpm                                   
JavaEWAH                                                       1.1.6                                      java-archive         (+1 duplicate)   
Jinja2                                                         2.7.2                                      python                                
Log4jHotPatch                                                                                             java-archive         (+1 duplicate)   
...

Syft 1.3 does complete successfully producing the sbom-full.spdx

$ ./syft dir:/ --exclude ./opt/codedeploy-agent/deployment-root --exclude ./opt/wildfly-18.0.1.Final/standalone/tmp/vfs --exclude='**/*.spdx.json' -o spdx-json=sbom-full.spdx.json 

 ✔ Indexed file system                                                                                                              /
 ✔ Cataloged contents                                                8a5edab282632443219e051e4ade2d1d5bbc671c781051bf1437897cbdfea0f1
   ├── ✔ Packages                        [5,521 packages]
   ├── ✔ File digests                    [47,418 files]
   ├── ✔ File metadata                   [47,475 locations]
   └── ✔ Executables                     [2,445 executables]

A newer version of syft is available for download: 1.10.0 (installed version is 1.3.0)

{"spdxVersion":"SPDX-2.3","dataLicense":"CC0-1.0","SPDXID":"SPDXRef-DOCUMENT","name":"/","documentNamespace":"https://anchore.com/syft/dir/-847b627d-4d7e-4e5d-b539-4b9910dacd13","creationInfo":{"licenseListVersion":"3.23","creators":["Organization: Anchore, Inc","Tool: syft-1.3.0"],"created":"2024-08-02T08:58:16Z"},"packages":[{"name":"","SPDXID":"SPDXRef-Package-gem-5c037fe26ac58834","versionInfo":"0.2.1","supplier":"Person: watsonian","originator":"Person: watsonian","downloadLocation":"NOASSERTION","filesAnalyzed":false,"sourceInfo":"acquired package info from installed gem metadata file: opt/codedeploy-agent/vendor/specifications/simple_pid.gemspec" ...