Closed braco closed 6 years ago
Thanks for bringing that to our attention. The bash script for the TensorFlow packager should be already doing its best to remove unnecessary files. We'll look into this for the latest TensorFlow versions.
After running some tests, the example using TF 1.8 generates a zip file of 70 MB. I was able to deploy it correctly in 2 different AWS accounts.
According to your output, you are trying to upload a 91.63 MB zip file, which is way bigger than the one produced by the example. Are you adding other libraries along with TensorFlow? Can you please try re-running the example by itself, deleting first the .ephemeral
and .serverless
folders to see if you are getting a ~70MB zipped file?
Dear alexleonescalera,
I am packaging a tensorflow based project using the method introduced in this project. Numpy, Scipy, and Pandas are also used.
My problem is the package is too big. It's size is more than 120MB. unzipped size larger than 400MB. This exceeds the Lambda limit. May I know how you keep your size in less than 80MB? Thanks, xiao shan
The below-80MB size is TensorFlow by itself, with small test files.
TensorFlow 1.8 by itself produces a 67 MB zip file, and the unzipped directory is calculated around 299 MB. It looks like your other dependencies are adding up to the final size.
Unfortunately, the limits of AWS Lambdas are set and there is only as much it can be done to reduce Python libraries size. So the only solutions I can think of for your scenario are:
You can find out the size of the produced files by looking inside the .ephemeral
directory.
@alexleonescalera, what do you think about the insights here?
https://blog.waya.ai/deploy-deep-machine-learning-in-production-the-pythonic-way-a17105f1540e
From their virtual env bash script, it seems that they could reduce the final size even further:
Our packager bash script is already zipping only the necessary source code (.py files). It ignores compiled files, *.so binaries (as far as we've tested, there are no errors) and unnecessary libraries.
As for the "building from source" idea, I guess we might need a scalpel to go through TF source and remove things that are not really needed for run-time (certain tools?) Also, there might be room for customized TF builds depending on your needs. Unfortunately, I'm more a NodeJS dev, not really a Python one, so I'll rely on the community for such a task if ever considered.
The good thing is that the plug in is thought so that you can create your own custom packager (Docker) or point to an existing zipped library.
Thank you, I'm looking more deeply into all of these things.
Any reason you're building in the shell script instead of Docker?
Preference for multiple reasons:
In essence, I tend to use RUN only for simple preparation (installation) tasks. When there are more complex logical tasks involved, I rather use a shell script.
Any ideas for this besides requesting a limit increase via support?
If it's relevant, I upgraded to 1.8
But otherwise started with this: https://github.com/Accenture/serverless-ephemeral/tree/master/examples/tensorflow-lambda