Bayer-Group / tiffslide

TiffSlide - cloud native openslide-python replacement based on tifffile
Other
86 stars 12 forks source link

Too big for AWS Lambda layer #9

Open rmontroy opened 3 years ago

rmontroy commented 3 years ago

Is there any way to trim down the dependencies? The total uncompressed size of a deployment package in AWS Lambda can't be more than 250MB. The total size of tiffslide plus dependencies is 195MB, which means I'm over the limit when I add other things (e.g. s3fs).

ap-- commented 3 years ago

hmmm, good question...

...
888K    ./urllib3
1,1M    ./pkg_resources
1,2M    ./tifffile
1,4M    ./yarl
1,4M    ./zarr
1,6M    ./chardet
2,8M    ./PIL
3,8M    ./setuptools
7,5M    ./aiohttp
7,7M    ./Pillow.libs
11M ./pip
26M ./numcodecs
30M ./numpy
33M ./numpy.libs
38M ./imagecodecs.libs
60M ./botocore
66M ./imagecodecs
295M    .

It seems like imagecodecs is the biggest offender with ~104MB total. And when you install s3fs, botocore is another 60MB. I'll have to think about it a bit. Basically all these dependencies are pulled in via tifffile.

ap-- commented 3 years ago

Hi @rmontroy

So the way to go seems to be to install only what you require to decode the svs images you want to process. imagecodecs can be built with a subset of supported formats by skipping everything else:

For example on Ubuntu 20.04:

### Disabled by default
# --global-option="--skip-avif" \
# --global-option="--skip-brunsli" \
# --global-option="--skip-jpegls" \
# --global-option="--skip-jpegxl" \
# --global-option="--skip-lerc" \
# --global-option="--skip-lz4f" \
# --global-option="--skip-zfp" \
# --global-option="--skip-zlibng" \

### Required for the tiffslide tests to pass
# --global-option="--skip-shared" \
# --global-option="--skip-imcd" \
# --global-option="--skip-jpeg8" \

python -m pip install imagecodecs \
  --global-option="build_ext" \
  --global-option="--skip-aec" \
  --global-option="--skip-bitshuffle" \
  --global-option="--skip-blosc" \
  --global-option="--skip-brotli" \
  --global-option="--skip-bz2" \
  --global-option="--skip-deflate" \
  --global-option="--skip-gif" \
  --global-option="--skip-jpeg2k" \
  --global-option="--skip-jpegxr" \
  --global-option="--skip-lz4" \
  --global-option="--skip-lzf" \
  --global-option="--skip-lzma" \
  --global-option="--skip-pglz" \
  --global-option="--skip-png" \
  --global-option="--skip-rcomp" \
  --global-option="--skip-snappy" \
  --global-option="--skip-tiff" \
  --global-option="--skip-webp" \
  --global-option="--skip-zlib" \
  --global-option="--skip-zopfli" \
  --global-option="--skip-zstd"

The above command installs a version imagecodecs that only has the required formats to make the tiffslide tests pass. It could be that when you test with an svs of yours, an error will be raised and you need to rebuild imagecodecs and leave out the specific skip option to prevent that error for your file (i.e. maybe jpeg2k or so...).

Here is a link to the relevant instructions in imagecodecs: https://github.com/cgohlke/imagecodecs/blob/e92cef6c1878f0b69ebbfb33f9bb809eccbdc31e/imagecodecs/imagecodecs.py#L160-L181 You might have to install the system build dependencies if the build fails for you.

I hope that helps. Let me know how things go.

Cheers, Andreas :smiley:

PS.: It could be that we would need to build a manylinux wheel to make it work since we're linking against os libraries. PSS.: Another option could be to manually remove the unneeded *.so files in the site-packages/imagecodecs folder before you create you lambda layer.

cgohlke commented 3 years ago

Another option: unpack the imagecodecs wheel, remove unneeded *.so files, repack the wheel, and install it. See https://wheel.readthedocs.io/en/stable/reference/wheel_pack.html#examples

rmontroy commented 3 years ago

@ap-- Any interest in creating a public AWS Lambda layer that's as small as possible? I don't have the time to look into it now, but I'd use it if it already existed, provided it performed well enough.

ap-- commented 3 years ago

I'll have a look. I might have some time next week to make the layer or at least an easy way to create a minimal venv.

swamidass commented 1 year ago

FYI, there is a straightforward solution, using the serverless framework and this plugin:

https://www.serverless.com/plugins/serverless-python-requirements

Key configuration is to add:

custom:
  pythonRequirements:
    zip: true

And to include this in your handler:

try:
  import unzip_requirements
except ImportError:
  pass

This zips all the requirements (including tiffslide) so that it falls below the cutoff by a wide margin. This does add a second or two to cold starts, but there is no cost for warm starts.