Closed xinli-cai closed 4 months ago
Update on Package Size Reduction:
In an effort to minimize the package size, I explored the following two solutions:
Conclusion: The NLP libraries' sizes make it challenging to deploy using zip packages, and the available compression strategies via the plugin offer limited reductions. As a next step, I recommend and will work on deploying the Lambda using the ECR. This approach could also be beneficial for deploying larger language models in the future.
This is a good blog post describing how to deploy LLMs on AWS using docker.
This is a good blog post describing how to deploy LLMs on AWS using docker.
ECR looks to be the way to go, but I would suggest not using the last step (i.e., Serverless) to deploy the function. This is because we already have CloudFormation scripts which integrates well with our existing deployment stacks. Another way to rephrase this is that: once the container is deployed to ECR, is it straightforward to reference to it without an external (to AWS) Serverless dependency. The Serverless framework would be useful if we wanted to deploy to multi-Cloud.
We chose to architecture to finetune the LLM models in HPC and host the model using SageMaker Endpoints, detailed architecture is attached below: https://github.com/Canadian-Geospatial-Platform/semantic-search-with-amazon-opensearch/blob/main/image/Semantic_search_finetune_fullstack.png
Current Lambda zip deployment has a size limit of 250mb, which limits the use of large pre-trained models in the similarity engine Lambda deployment on AWS Cloud Environment. After resesarch, I will explore the following two options: