dholth commented 1 year ago

Checklist

[X] I added a descriptive title
[X] I searched open reports and couldn't find a duplicate

What happened?

zstd appears to be using a surprising amount of memory on Windows, even if the package is small. It shouldn't actually use more memory than the total size of the uncompressed data. It looks like it might be trying to allocate that memory anyway?

Traceback (most recent call last):
  File "C:\Miniforge\Scripts\conda-mambabuild-script.py", line 9, in <module>
    sys.exit(main())
  File "C:\Miniforge\lib\site-packages\boa\cli\mambabuild.py", line 256, in main
    call_conda_build(action, config)
  File "C:\Miniforge\lib\site-packages\boa\cli\mambabuild.py", line 228, in call_conda_build
    result = api.build(
  File "C:\Miniforge\lib\site-packages\conda_build\api.py", line 186, in build
    return build_tree(
  File "C:\Miniforge\lib\site-packages\conda_build\build.py", line 3091, in build_tree
    packages_from_this = build(metadata, stats,
  File "C:\Miniforge\lib\site-packages\conda_build\build.py", line 2374, in build
    newly_built_packages = bundlers[pkg_type](output_d, m, env, stats)
  File "C:\Miniforge\lib\site-packages\conda_build\build.py", line 1694, in bundle_conda
    conda_package_handling.api.create(
  File "C:\Miniforge\lib\site-packages\conda_package_handling\api.py", line 117, in create
    raise err
  File "C:\Miniforge\lib\site-packages\conda_package_handling\api.py", line 110, in create
    out = format.create(prefix, file_list, out_fn, out_folder, **kw)
  File "C:\Miniforge\lib\site-packages\conda_package_handling\conda_fmt.py", line 111, in create
    component_stream.close()
zstd.ZstdError: zstd compress error: Allocation error : not enough memory

Conda Info

No response

Conda Config

No response

Conda list

No response

Additional Context

No response

jezdez commented 1 year ago

https://news.ycombinator.com/item?id=18721433

dholth commented 1 year ago

Our conda-build update, to correctly pass 16 from a config file, instead of the (too high?) default of 22, should also reduce memory usage.

I am having trouble figuring out how much from the zstd manual or man pages.

dholth commented 1 year ago

A quick memory test.

import zstandard
import sys
import io

compressor = zstandard.ZstdCompressor(level=int(sys.argv[1]))
writer = compressor.stream_writer(io.BytesIO())
writer.write(b"hello")
writer.close()

OSX's /usr/bin/time shows us memory needs:

% /usr/bin/time -l python memtest.py 12
36225024 maximum resident set size
% /usr/bin/time -l python memtest.py 16
44367872 maximum resident set size
% /usr/bin/time -l python memtest.py 19
95436800 maximum resident set size
% /usr/bin/time -l python memtest.py 22
691208192 maximum resident set size

So we go from 36MB, 44MB all the way up to 691MB. The conda-build update should help - we were experiencing a bug where conda-forge wanted -16 but got the default of -22.

I think it is possible to pass the input size into zstdcompressor, if we use the API more smartly. This would help when packages are less than e.g. the 691MB buffer seen in -22.

I would be open to changing our default to -19 which is the highest the command line tool lets you do, without passing --ultra...

dholth commented 1 year ago

conda-package-handling 2.0's conda creation code is in the style of conda-package-streaming, but is independent https://github.com/conda/conda-package-handling/blob/main/src/conda_package_handling/conda_fmt.py#L104

(It would be tricky but very nice to figure out a good "give me successive objects that can stream data into the .conda zip archive" API, all in memory, to go into conda-package-streaming)

mbargull commented 1 year ago

I highly recommend going for -19 at max. Not just because of the resources during compression but also for the decompression on the users' side. For decompression you'd need some sub 10M base memory plus the window size (which is capped by the input size, i.e., it's most relevant for big packages), e.g.,

level 19: window size: 2**23=8M
level 20: window size: 2**25=32M
level 21: window size: 2**26=64M
level 22: window size: 2**27=128M

For compression the higher levels additionally use bigger search tables which is why the memory increases more. See https://github.com/facebook/zstd/blob/v1.5.2/lib/compress/clevels.h#L44-L50 for the actual (logarithmic) values used for each level and https://manpages.ubuntu.com/manpages/focal/man1/zstd.1.html for a short description of those.

You get slightly higher resource usage for -19 compared to -18 during compression (and the same low max x+8M for decompression). I'd go for one of those and try -19 first and see if we encounter cases that demand lowering it to -18 if needed.

mbargull commented 1 year ago

We're also probably going for -19 as the default on the conda-forge side, see https://github.com/conda-forge/conda-forge-ci-setup-feedstock/pull/217 . (Yay, I could just copy/paste the comment from here :grin:.)

dholth commented 1 year ago

Hopefully our parallel decompression will be okay...

mbargull commented 1 year ago

Hopefully our parallel decompression will be okay...

If https://github.com/conda/conda/blob/22.11.0/conda/core/package_cache_data.py#L72 sets the max. parallelism, then you're most definitely fine :). Plus, you need to parallelly extract a bunch of >=128M tarballs to get peak memory usage. And only the current builds from conda-forge from the last two weeks are affected (AFAICT Anaconda's are transmuted with a custom lower level, i.e., not affected). No need to worry there.

mbargull commented 1 year ago

I think it is possible to pass the input size into zstdcompressor, if we use the API more smartly. This would help when packages are less than e.g. the 691MB buffer seen in -22.

I suspect that cph 2.x is much more likely to encounter OOM situations than the 1.x ones -- not unlikely due to the (if I understand the comment above correctly) fact that no size hints are provided.

mbargull commented 1 year ago

(because we saw the OOM issue in the last few hours after the 2.x was published on conda-forge quite frequently)

dholth commented 1 year ago

I did a simple test after using cph x to extract a package, and then recreating it, using OSX's peak-memory-showing /usr/bin/time. This can be run in 1.x and 2.x since the CLI is the same. create, c should be more like what conda-build does, compared to transmute. The memory usage looks consistent with cph 2.0 keeping two zstd compressors in memory at the same time (for the pkg- and info- components). Though libarchive as used in 1.9 is definitely using a different version of the zstandard library.

/usr/bin/time -l cph c python-3.11.0-h93c2e33_0_cpython --out-folder /tmp/ python-3.11.0-h93c2e33_0_cpython.conda

mbargull commented 1 year ago

On Linux I get about twice as much memory usage:

(base) conda list conda-package-handling
# packages in environment at /opt/conda:
#
# Name                    Version                   Build  Channel
conda-package-handling    1.9.0           py310h5764c6d_1    conda-forge
(base) /usr/bin/time --format=%M cph c python-3.10.8-h4a9ceb5_0_cpython.conda --out-folder /tmp/ python-3.10.8-h4a9ceb5_0_cpython.conda 
706156
(base) rm /tmp/python-3.10.8-h4a9ceb5_0_cpython.conda 
(base) mamba install conda-package-handling\>=2 -y >/dev/null 2>&1
(base) conda list conda-package-handling
# packages in environment at /opt/conda:
#
# Name                    Version                   Build  Channel
conda-package-handling    2.0.1              pyh38be061_0    conda-forge
(base) /usr/bin/time --format=%M cph c python-3.10.8-h4a9ceb5_0_cpython.conda --out-folder /tmp/ python-3.10.8-h4a9ceb5_0_cpython.conda 
1394704

dholth commented 1 year ago

169 calls compressor() once, and should halve the memory usage to be comparable to libarchive-using 1.9.

The size hints are easy to get at https://python-zstandard.readthedocs.io/en/latest/compressor.html?highlight=stream_writer#zstandard.ZstdCompressor.stream_writer but I'm not currently prepared to reintroduce temporary files, to be able to use size hints. It would be a good experiment to measure their impact. It might be easy to add them on extract in conda-package-streaming.

mbargull commented 1 year ago

I wonder about the practicality of using python-zstandard though.. From a first glance I could not see an exposed method on the compressor instance to free memory and del compressor also didn't seem to free anything. For a CLI invocation of cph or a single package build gh-169 would be enough. But what about recipes with multiple outputs? What about the conda-build test suite? And most importantly: Does the decompressor behave the same? Then we'd have a problem with package extraction in conda itself -- no matter if parallel or serial...

I'd suggest that we pull the conda-forge::conda-package-handling=2.0.1 packages until those questions are resolved.

dholth commented 1 year ago

https://github.com/conda/conda-package-handling/blob/memory/tests/memory_test.py

uses same memory with times=1 or =156

mbargull commented 1 year ago

Interesting.. Any idea why we saw different behavior for cph?

dholth commented 1 year ago

Some nuance of reference counting surely.

Do we have an excuse to use memray?

dholth commented 1 year ago

If it's any comfort the extract part of conda-package-streaming used by this package was originally written for a system that extracts every single package in defaults in a loop, to do some analysis.

dholth commented 1 year ago

2.0.2 released

mbargull commented 1 year ago

I'd still be good to have a ticket open to track the questions from https://github.com/conda/conda-package-handling/issues/167#issuecomment-1333023530 (i.e., until it's ensured that it doesn't cause memory leaks if CondaFormat_v2.create is run multiple times in the same Python process).

dholth commented 1 year ago

conda / conda-package-handling

Out of memory on Windows #167

Checklist

What happened?

Conda Info

Conda Config

Conda list

Additional Context

169 calls compressor() once, and should halve the memory usage to be comparable to libarchive-using 1.9.

176