anchore / stereoscope

go library for processing container images and simulating a squash filesystem
Apache License 2.0
76 stars 43 forks source link

Very High Memory Usage Using Syft #233

Open dor-hayun opened 2 months ago

dor-hayun commented 2 months ago

What happened: High memory consumption when scanning 5GB image

What you expected to happen: controlled memory consumption

Steps to reproduce the issue: scan very big sized images and check the memory consumption

Anything else we need to know?:

image

Environment:

abhiseksanyal commented 2 months ago

I am seeing the same issue where it took almost 20 GB RSS memory

I tested with syft 1.5.0 and 1.4.1 on an Ubuntu 22.04 on an EC2 system with 32 GB of RAM and 8 vCPU and saw both of them having this issue

Test was run against a 5.5+ GB image on GCR that has Maven components on an RHEL 7.9 Image

image

syft ran for quite some time and then exited without any error

wagoodman commented 1 month ago

An initial look shows that, depending on the image being scanned, the CSV reader used within the mimetype detector lib is what's eating much of the total allocated space

Screenshot 2024-07-10 at 9 36 42 AM

It seems like we're using an older version of mimetype that does not incorporate https://github.com/gabriel-vasile/mimetype/pull/355 . When I bump the dependency and incorporate this fix though, I see the memory allocated within the mimetype.DetectReader() call drop from 1.1GB to 740MB, which is an improvement, but I was expecting much less consumption.

I'll see what else I can do here, but since much of the consumption is from the CSV and TSV detectors alone, I'm considering dropping those detectors entirely (which would require a fork in the short term).

wagoodman commented 1 month ago

I've got a prototype csv/tsv detector that is pretty bare-boned, but it drops the total memory allocation from 740MB to 330MB. I'll see if I can get that PR tested and in the upstream.

wagoodman commented 1 month ago

@abhiseksanyal the screenshot is showing 9GB being used -- are you describing two different invocations?

syft ran for quite some time and then exited without any error

did syft display an SBOM result? Or exited without error or SBOM result?

wagoodman commented 1 month ago

The PR that attempts to reduce total memory allocation is stalled for a while https://github.com/anchore/mimetype/pull/2

dor-hayun commented 1 month ago

@wagoodman thank you very much, will it be part of the next release of Syft?

dor-hayun commented 3 weeks ago

Hi @wagoodman , any update here?