An error occurred: TensorflowTestLambdaFunction - Unzipped size must be smaller than...

Accenture / serverless-ephemeral

This is a Serverless Framework plugin that helps bundling any stateless zipped library to AWS Lambda.

67 stars 15 forks source link

An error occurred: TensorflowTestLambdaFunction - Unzipped size must be smaller than... #16

Closed braco closed 6 years ago

braco commented 6 years ago

Any ideas for this besides requesting a limit increase via support?

Serverless: Packaging service...
Serverless: Excluding development dependencies...
Serverless: Uploading CloudFormation file to S3...
Serverless: Uploading artifacts...
Serverless: Uploading service .zip file to S3 (91.63 MB)...
Serverless: Validating template...
Serverless: Updating Stack...
Serverless: Checking Stack update progress...
.....
Serverless: Operation failed!

  Serverless Error ---------------------------------------

  An error occurred: TensorflowTestLambdaFunction - Unzipped size must be smaller than 262144000 bytes (Service: AWSLambda; Status Code: 400; Error Code: InvalidParameterValueException; Request ID: ...

If it's relevant, I upgraded to 1.8

custom:
  ephemeral:
    libraries:
      - packager:
          name: tensorflow
          version: 1.8.0

But otherwise started with this: https://github.com/Accenture/serverless-ephemeral/tree/master/examples/tensorflow-lambda

alexleonescalera commented 6 years ago

Thanks for bringing that to our attention. The bash script for the TensorFlow packager should be already doing its best to remove unnecessary files. We'll look into this for the latest TensorFlow versions.

alexleonescalera commented 6 years ago

After running some tests, the example using TF 1.8 generates a zip file of 70 MB. I was able to deploy it correctly in 2 different AWS accounts.

According to your output, you are trying to upload a 91.63 MB zip file, which is way bigger than the one produced by the example. Are you adding other libraries along with TensorFlow? Can you please try re-running the example by itself, deleting first the .ephemeral and .serverless folders to see if you are getting a ~70MB zipped file?

xsfang commented 6 years ago

Dear alexleonescalera,

I am packaging a tensorflow based project using the method introduced in this project. Numpy, Scipy, and Pandas are also used.

My problem is the package is too big. It's size is more than 120MB. unzipped size larger than 400MB. This exceeds the Lambda limit. May I know how you keep your size in less than 80MB? Thanks, xiao shan

alexleonescalera commented 6 years ago

The below-80MB size is TensorFlow by itself, with small test files.

TensorFlow 1.8 by itself produces a 67 MB zip file, and the unzipped directory is calculated around 299 MB. It looks like your other dependencies are adding up to the final size.

Unfortunately, the limits of AWS Lambdas are set and there is only as much it can be done to reduce Python libraries size. So the only solutions I can think of for your scenario are:

Use TF's included NumPy and remove yours to reduce size
Ask for an increase of Lambda limits
Use a different approach such as Kubernetes service

You can find out the size of the produced files by looking inside the .ephemeral directory.

braco commented 6 years ago

@alexleonescalera, what do you think about the insights here?

https://blog.waya.ai/deploy-deep-machine-learning-in-production-the-pythonic-way-a17105f1540e

alexleonescalera commented 6 years ago

From their virtual env bash script, it seems that they could reduce the final size even further:

No checks for compiled files .pyc, which are not needed
Libraries inside site-packages that are not required for runtime are not cleaned up (pip, easy_install, wheel...)

Our packager bash script is already zipping only the necessary source code (.py files). It ignores compiled files, *.so binaries (as far as we've tested, there are no errors) and unnecessary libraries.

As for the "building from source" idea, I guess we might need a scalpel to go through TF source and remove things that are not really needed for run-time (certain tools?) Also, there might be room for customized TF builds depending on your needs. Unfortunately, I'm more a NodeJS dev, not really a Python one, so I'll rely on the community for such a task if ever considered.

The good thing is that the plug in is thought so that you can create your own custom packager (Docker) or point to an existing zipped library.

braco commented 6 years ago

Thank you, I'm looking more deeply into all of these things.

Any reason you're building in the shell script instead of Docker?

alexleonescalera commented 6 years ago

Preference for multiple reasons:

More readable. An IDE or code editor is more likely to have shell highlighting for a .sh file than a Dockerfile
Refactoring. On the same note, I find it easier to refactor a shell script than having to deal with multi-lined RUN directives or multiple RUN directives. Also, no need to edit the Dockerfile whenever making complex logical changes.
Cleanliness. I like more a clean shell script than several Docker RUN lines try to attempt different things.

In essence, I tend to use RUN only for simple preparation (installation) tasks. When there are more complex logical tasks involved, I rather use a shell script.