Open rmontroy opened 3 years ago
hmmm, good question...
...
888K ./urllib3
1,1M ./pkg_resources
1,2M ./tifffile
1,4M ./yarl
1,4M ./zarr
1,6M ./chardet
2,8M ./PIL
3,8M ./setuptools
7,5M ./aiohttp
7,7M ./Pillow.libs
11M ./pip
26M ./numcodecs
30M ./numpy
33M ./numpy.libs
38M ./imagecodecs.libs
60M ./botocore
66M ./imagecodecs
295M .
It seems like imagecodecs is the biggest offender with ~104MB total. And when you install s3fs, botocore is another 60MB. I'll have to think about it a bit. Basically all these dependencies are pulled in via tifffile.
Hi @rmontroy
So the way to go seems to be to install only what you require to decode the svs images you want to process.
imagecodecs
can be built with a subset of supported formats by skipping everything else:
For example on Ubuntu 20.04:
### Disabled by default
# --global-option="--skip-avif" \
# --global-option="--skip-brunsli" \
# --global-option="--skip-jpegls" \
# --global-option="--skip-jpegxl" \
# --global-option="--skip-lerc" \
# --global-option="--skip-lz4f" \
# --global-option="--skip-zfp" \
# --global-option="--skip-zlibng" \
### Required for the tiffslide tests to pass
# --global-option="--skip-shared" \
# --global-option="--skip-imcd" \
# --global-option="--skip-jpeg8" \
python -m pip install imagecodecs \
--global-option="build_ext" \
--global-option="--skip-aec" \
--global-option="--skip-bitshuffle" \
--global-option="--skip-blosc" \
--global-option="--skip-brotli" \
--global-option="--skip-bz2" \
--global-option="--skip-deflate" \
--global-option="--skip-gif" \
--global-option="--skip-jpeg2k" \
--global-option="--skip-jpegxr" \
--global-option="--skip-lz4" \
--global-option="--skip-lzf" \
--global-option="--skip-lzma" \
--global-option="--skip-pglz" \
--global-option="--skip-png" \
--global-option="--skip-rcomp" \
--global-option="--skip-snappy" \
--global-option="--skip-tiff" \
--global-option="--skip-webp" \
--global-option="--skip-zlib" \
--global-option="--skip-zopfli" \
--global-option="--skip-zstd"
The above command installs a version imagecodecs that only has the required formats to make the tiffslide tests pass. It could be that when you test with an svs of yours, an error will be raised and you need to rebuild imagecodecs and leave out the specific skip option to prevent that error for your file (i.e. maybe jpeg2k or so...).
Here is a link to the relevant instructions in imagecodecs: https://github.com/cgohlke/imagecodecs/blob/e92cef6c1878f0b69ebbfb33f9bb809eccbdc31e/imagecodecs/imagecodecs.py#L160-L181 You might have to install the system build dependencies if the build fails for you.
I hope that helps. Let me know how things go.
Cheers, Andreas :smiley:
PS.: It could be that we would need to build a manylinux wheel to make it work since we're linking against os libraries. PSS.: Another option could be to manually remove the unneeded *.so files in the site-packages/imagecodecs folder before you create you lambda layer.
Another option: unpack the imagecodecs wheel, remove unneeded *.so files, repack the wheel, and install it. See https://wheel.readthedocs.io/en/stable/reference/wheel_pack.html#examples
@ap-- Any interest in creating a public AWS Lambda layer that's as small as possible? I don't have the time to look into it now, but I'd use it if it already existed, provided it performed well enough.
I'll have a look. I might have some time next week to make the layer or at least an easy way to create a minimal venv.
FYI, there is a straightforward solution, using the serverless framework and this plugin:
https://www.serverless.com/plugins/serverless-python-requirements
Key configuration is to add:
custom:
pythonRequirements:
zip: true
And to include this in your handler:
try:
import unzip_requirements
except ImportError:
pass
This zips all the requirements (including tiffslide) so that it falls below the cutoff by a wide margin. This does add a second or two to cold starts, but there is no cost for warm starts.
Is there any way to trim down the dependencies? The total uncompressed size of a deployment package in AWS Lambda can't be more than 250MB. The total size of tiffslide plus dependencies is 195MB, which means I'm over the limit when I add other things (e.g. s3fs).