Machine learning inference at scale using AWS serverless

This sample solution shows you how to run and scale ML inference using AWS serverless services: AWS Lambda and AWS Fargate. This is demonstrated using an image classification use case.

Architecture

The following diagram illustrates the solutions architecture for both batch and real-time inference options.

architecture

Deploying the solution

To deploy and run the solution, you need access to:

An AWS account
A terminal with AWS Command Line Interface (CLI), CDK, Docker, git, and Python installed.
- You may use the terminal on your local X86_64 machine or use an AWS Cloud9 environment (This code is validated on a m5.large AWS Cloud9 instance).

To deploy the solution, open your terminal window and complete the following steps.

Clone the GitHub repo
git clone https://github.com/aws-samples/aws-serverless-for-machine-learning-inference.git
Navigate to the /install directory and deploy the CDK application.
./install.sh
or
If using Cloud9:
./cloud9_install.sh
Enter Y to proceed with the deployment on the confirmation screen.

Running inference

The solution lets you get predictions for either a set of images using batch inference or for a single image at a time using real-time API end-point.

Batch inference

Get batch predictions by uploading image files to Amazon S3.

Upload one or more image files to the S3 bucket path, ml-serverless-bucket--/input, from Amazon S3 console or using AWS CLI.
aws s3 cp <path to jpeg files> s3://ml-serverless-bucket-<acct-id>-<aws-region>/input/ --recursive
This will trigger the batch job, which will spin-off Fargate tasks to run the inference. You can monitor the job status in AWS Batch console.
Once the job is complete (this may take a few minutes), inference results can be accessed from the ml-serverless-bucket--/output path

Real-time inference

Get real-time predictions by invoking the API endpoint with an image payload.

Navigate to the CloudFormation console and find the API endpoint URL (httpAPIUrl) from the stack output.
Use a REST client, like Postman or curl command, to send a POST request to the /predict api endpoint with image file payload.
curl -v -H "Content-Type: application/jpeg" --data-binary @<your jpg file name> <your-api-endpoint-url>/predict
Inference results are returned in the API response.

Cleaning up

Navigate to the /app directory from the terminal window and run the following command to destroy all resources and avoid incurring future charges.
cdk destroy -f

Security

See CONTRIBUTING for more information.

License

This library is licensed under the MIT-0 License. See the LICENSE file.

aws-samples / aws-serverless-for-machine-learning-inference

readme