Find a Cheaper ML Model Deployment and Hosting Solution

We currently host our OpenaAI Whisper and BERT models on Huggingface using Inference Endpoints. This currently costs [todo: get price] per month. We should switch to a cheaper option.

I haven't run a detailed analysis yet on how much these other solutions will cost but a general rule of thumb is that the more work they do for you (the easier it is to setup), the more expensive it will be. So I'm assuming that any of the other options will be cheaper.

However, before building a solution, we should first get price estimates for how much each option will cost.

In order of easy/expensive to hard/cheap:

Hugging Face Inference endpoints
AWS Sagemaker: Tutorial
AWS EC2 Instance
Vast.ai

Note: I'm assuming (80% confident) that options 2-3 will be cheaper than option 1 which we currently use. However, I have low confidence (20% confident) that the order amongst 2-3 is correct. For example, Vast.ai might actually be easier to setup than an AWS EC2 instance and might also be cheaper.

Action Items

[ ] Estimate cost hosting the Hugging Face Inference endpoints on AWS Sagemaker, AWS EC2 Instance, Vast.ai
[ ] Estimate ease of setup and managing
[ ] Pick the best option and deploy

atilatech / atila-core-service

Find a Cheaper ML Model Deployment and Hosting Solution #3

Action Items