Open rriemann opened 1 month ago
Hi @rriemann - thanks for reporting this issue, and for the reproducible steps.
I have tried on my (chunky) laptop and syft peaks at 8.7GB RAM usage. So I'm not surprised it gets oom-killed. Log below.
Memory profiling...
$ export SYFT_DEV_PROFILE=mem
$ export TMPDIR=$(pwd)/syft_tmp
$ ~/bin/syft scan --select-catalogers -javascript -vv --parallelism 6 -o cyclonedx-json=gl-sbom-report.cdx.json registry.gitlab.com/eu-os/workspace-images/eu-os-base-demo/eu-os-demo:acba8f13-41 > syft_log.txt 2>&1
$ go tool pprof ~/bin/syft ./syft_tmp/profile2797470040/mem.pprof
File: syft
Type: inuse_space
Time: 2025-04-17 12:53:12 BST
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) web
(pprof) top
Showing nodes accounting for 8700.52MB, 85.68% of 10155.10MB total
Dropped 583 nodes (cum <= 50.78MB)
Showing top 10 nodes out of 86
flat flat% sum% cum cum%
4278.56MB 42.13% 42.13% 6323.47MB 62.27% github.com/anchore/stereoscope/pkg/tree.(*Tree).Copy
2044.89MB 20.14% 62.27% 2044.89MB 20.14% github.com/anchore/stereoscope/pkg/filetree/filenode.(*FileNode).Copy
1059.07MB 10.43% 72.70% 1059.07MB 10.43% github.com/anchore/stereoscope/pkg/tree/node.IDSet.Add (inline)
335.34MB 3.30% 76.00% 335.34MB 3.30% github.com/anchore/stereoscope/pkg/tree.(*Tree).addNode
197.12MB 1.94% 77.94% 197.12MB 1.94% github.com/anchore/stereoscope/pkg/tree/node.NewIDSet (inline)
192.95MB 1.90% 79.84% 1449.14MB 14.27% github.com/anchore/stereoscope/pkg/filetree.(*searchContext).buildLinkResolutionIndex
164MB 1.61% 81.46% 970.95MB 9.56% github.com/anchore/stereoscope/pkg/file.NewTarIndex.func1
153.59MB 1.51% 82.97% 154.64MB 1.52% github.com/anchore/stereoscope/pkg/file.NewMetadata
141.79MB 1.40% 84.36% 262.85MB 2.59% github.com/anchore/stereoscope/pkg/filetree.(*index).Add
133.20MB 1.31% 85.68% 806.95MB 7.95% github.com/anchore/stereoscope/pkg/image.(*Layer).readStandardImageLayer.layerTarIndexer.func1
I do not really know the internals of syft and how to read this diagram, but my wild guess is that the ostree/composefs parts in the image contain some (recursive) duplicated scanning of the file tree.
I do not believe this is the same issue as https://github.com/anchore/syft/issues/3651, which was due to a bug that caused squashfs to continue reading after EOF, and I believe should be fixed in syft v1.22.0, though I had some difficulty tracking down which commits would validate this. I was able to successfully run syft to catalog this image locally, on my MacBook Pro with 32 GB RAM, though I did see memory usage get upwards of 20GB, possibly up to 30GB including swap space.
A few more pertinent details: this is an 8.1 GB image we're scanning, with 74 layers:
$ docker save > ~/Downloads/eu-os-base-demo.tar
$ ls -alFh ~/Downloads/eu-os-base-demo.tar
-rw-r--r-- 1 kzantow staff 8.1G Apr 16 10:18 /Users/kzantow/Downloads/eu-os-base-demo.tar
...
[0243] DEBUG layer metadata: index=74 digest=sha256:02234c80d8b92cd4c408d9994b5b950f100b08431c6330a968a22bc1295857e4 mediaType=application/vnd.docker.image.rootfs.diff.tar.gzip
I also created a memory profile, which matches what @popey posted above: it looks like inuse space grows to over 8GB, causing this specific image to fail when only 8GB is available. The memory is being predominantly used by the in-memory representation of the filesystem of all the layers.
When you say "ostree/composefs parts in the image contain some (recursive) duplicated", are you referring to this occurring via symlinks? I believe these are normalized to the absolute paths and shouldn't be responsible for large issues. I'm actively investigating ways to help improve memory usage and performance in this part of the app, but don't have a lot of concrete changes yet. See: https://github.com/anchore/stereoscope/issues/233 and https://github.com/anchore/syft/issues/1446
We briefly discussed this on the live stream, and it feels like this could benefit from the perennial topic of "dialling down memory usage by swapping an in-memory database to disk" that we have discussed in the past.
As someone not aware of the internals of the software, I am surprised that scanning an 8GB image could require 20GB of memory.
Dropping a line here; also experiencing OOM when scanning a custom 6.8Gb image.
syft scan --scope all-layers --output cyclonedx-xml=sbom.xml podman:"${TARGET}":"${TARGET_VERSION_TAG}"
✔ Loaded image 891377244928.dkr.ecr.eu-west-1.amazonaws.com/prd/toolbox/jenkins-agents-wl-gc-iac/m590:0.17.3
A newer version of syft is available for download: 1.22.0 (installed version is 1.21.0)
[0045] ERROR could not determine source: an error occurred attempting to resolve '891377244928.dkr.ecr.eu-west-1.amazonaws.com/prd/toolbox/jenkins-agents-wl-gc-iac/m590:0.17.3': podman: unable to save image to tar: write /tmp/stereoscope-3378755864/podman-daemon-image-2776216565/image.tar: no space left on device
Would it be possible to use a custom TMP location?
@davidjeddy Hello! Yes, you can set the OS TMPDIR
to define where syft puts the temporary files.
$ mkdir -p /home/alan/Temp/tempsyft
$ TMPDIR=/home/alan/Temp/tempsyft syft nextcloud:latest
$ du -hs /home/alan/Temp/tempsyft
2.5G /home/alan/Temp/tempsyft
However, this issue is more about running out of RAM, while your issue is running out of disk space.
On GitHub's free runners, there appears to be a large secondary disk mounted to /mnt.
My current strategy is to eliminate the existing swapfile at /mnt/swapfile and replace it with 70GB swapfile at the same location. The disk appears to usually be between 75 GB and 84 GB.
This solves the OOM for the most part. However, scans take an extremely long time with several duplicates. I do not know if this due to the different scanners. My resolution here is to only scan RPMs. This significantly reduces memory usage.
My last space savings technique is to not use stereoscope. I instead have syft scan an oci-archive. This doesn't concern memory, but for the limited disk space on runners.
This solves the OOM for the most part. However, scans take an extremely long time with several duplicates. I do not know if this due to the different scanners. My resolution here is to only scan RPMs. This significantly reduces memory usage.
My last space savings technique is to not use stereoscope. I instead have syft scan an oci-archive. This doesn't concern memory, but for the limited disk space on runners.
Thanks for sharing your work around. Can you please give the commands for scanning an oci-archive and for rpm scan only?
My just
recipe is here: https://github.com/m2Giles/m2os/blob/463a56fda28bd9cd28d9fb1951d2b49719679eb5/Justfile#L743
For scanning an oci-archive; my input is the oci-archive directly. The github workflow is here that does the swapfile changes, and passes the oci-archive to the recipe: https://github.com/m2Giles/m2os/blob/463a56fda28bd9cd28d9fb1951d2b49719679eb5/.github/workflows/gen-sbom.yml
This doesn't concern memory, but for the limited disk space on runners.
This might be interesting - the author strips out most of the software on the runner, to make even more disk space available. Obviously that may mean something you actually need in your workflow is gone, but it certainly gets rid of a lot, so putting back just the bits you need (or not deleting them) might be an option if you want optimal disk space.
https://wimpysworld.com/posts/nothing-but-nix-github-actions/
My current strategy is to eliminate the existing swapfile at /mnt/swapfile and replace it with 70GB swapfile at the same location. The disk appears to usually be between 75 GB and 84 GB.
Another option which might be worth considering as a workaround is https://blacksmith.sh who claim to be a drop-in replacement for the standard runners, but "twice as fast, half the cost". I haven't tried it, but I have certainly seen others rave about it.
@popey Saw the comment on needing beefier runners. We have Depot GitHub Actions runners that are about 3-10x faster and also half the cost. Do some neat things inside of the runner like ramdisks for faster disk access, larger disk sizes, and they also integrate directly with our container build product + registry if you ever needed to use those. Some docs on the different runner types if you wanted to try them out: https://depot.dev/docs/github-actions/runner-types.
What happened:
I run this command with syft 1.22.0 on a CoreOS ARM machine with 8 cores and 16GB.
TMPDIR=$(pwd)/syft_tmp ~/.local/bin/syft scan --select-catalogers -javascript -vv --parallelism 6 -o cyclonedx-json=gl-sbom-report.cdx.json podman:registry.gitlab.com/eu-os/workspace-images/eu-os-base-demo/eu-os-demo:acba8f13-41
Log output
Exit code: 137
dmesg output:
What you expected to happen:
The program finishes with a json output.
Steps to reproduce the issue:
run the command above
Anything else we need to know?:
It also did not work on my desktop computer. I get the same out of memory error. This one is x86_64 with 8GB. On gitlab.com with their runners, the scan ran into a timeout.
This seems to be related to #3651. I disabled already the javascript part as you can see in the command, but it did not help.
Environment:
syft version
: 1.22.0cat /etc/os-release
or similar):NAME="Fedora Linux" VERSION="41.20250315.3.0 (CoreOS)" RELEASE_TYPE=stable ID=fedora VERSION_ID=41 VERSION_CODENAME="" PLATFORM_ID="platform:f41" PRETTY_NAME="Fedora CoreOS 41.20250315.3.0" ANSI_COLOR="0;38;2;60;110;180" LOGO=fedora-logo-icon CPE_NAME="cpe:/o:fedoraproject:fedora:41" HOME_URL="https://getfedora.org/coreos/" DOCUMENTATION_URL="https://docs.fedoraproject.org/en-US/fedora-coreos/" SUPPORT_URL="https://github.com/coreos/fedora-coreos-tracker/" BUG_REPORT_URL="https://github.com/coreos/fedora-coreos-tracker/" REDHAT_BUGZILLA_PRODUCT="Fedora" REDHAT_BUGZILLA_PRODUCT_VERSION=41 REDHAT_SUPPORT_PRODUCT="Fedora" REDHAT_SUPPORT_PRODUCT_VERSION=41 SUPPORT_END=2025-12-15 VARIANT="CoreOS" VARIANT_ID=coreos OSTREE_VERSION='41.20250315.3.0'