Explore solutions to deploy large language models in the AWS cloud environment

xinli-cai commented 1 year ago

Current Lambda zip deployment has a size limit of 250mb, which limits the use of large pre-trained models in the similarity engine Lambda deployment on AWS Cloud Environment. After resesarch, I will explore the following two options:

Use Lambda Layer to reduce the Lambda deployment size limit
Choose different deployement solutions such as Amazon Elastic Container Registry.

xinli-cai commented 1 year ago

Update on Package Size Reduction:

In an effort to minimize the package size, I explored the following two solutions:

Optimizing Installed Dependencies: By compressing and eliminating tests, extraneous information, and caches from the installed dependencies, I utilized the 'zip' and 'slim' features of the 'serverless-python-requirements' plugin in the Serverless framework. This optimization reduced the package size from 176MB to 158MB.
Implementing Lambda Layers: Created larger dependencies, such as numpy (70.9MB) and pandas (59.8MB). However, even after segmenting them into smaller zip files, we must keep in mind that the total extracted size limit remains in effect, requiring everything to be under 250MB.

Conclusion: The NLP libraries' sizes make it challenging to deploy using zip packages, and the available compression strategies via the plugin offer limited reductions. As a next step, I recommend and will work on deploying the Lambda using the ECR. This approach could also be beneficial for deploying larger language models in the future.

simpleParadox commented 1 year ago

This is a good blog post describing how to deploy LLMs on AWS using docker.

bo-lu commented 1 year ago

This is a good blog post describing how to deploy LLMs on AWS using docker.

ECR looks to be the way to go, but I would suggest not using the last step (i.e., Serverless) to deploy the function. This is because we already have CloudFormation scripts which integrates well with our existing deployment stacks. Another way to rephrase this is that: once the container is deployed to ECR, is it straightforward to reference to it without an external (to AWS) Serverless dependency. The Serverless framework would be useful if we wanted to deploy to multi-Cloud.

xinli-cai commented 4 months ago

We chose to architecture to finetune the LLM models in HPC and host the model using SageMaker Endpoints, detailed architecture is attached below: https://github.com/Canadian-Geospatial-Platform/semantic-search-with-amazon-opensearch/blob/main/image/Semantic_search_finetune_fullstack.png

Canadian-Geospatial-Platform / similarity-engine

Explore solutions to deploy large language models in the AWS cloud environment #2