VACOTechSprint / ambient-transcription

Docs and management tasks for sprint

1 stars 0 forks source link

Set up Container Deployment for Whisper X on Cloud #4

Closed dahifi closed 5 months ago

dahifi commented 5 months ago

Research and decide on cloud services (such as Google Cloud, AWS, or Azure) for container deployment of Whisper X that meets the requirements for Option 2 deployment on VA's cloud infrastructure. Verify compatibility and create a Dockerfile for the project.

This is a complex task that involves not only development but also research and potentially negotiation with cloud service providers. It might span several days to a couple of weeks.

dahifi commented 5 months ago

Discovery

I found several promising SAAS providers that host whisper models, of course the challenge here is that we need the diarization capabilities, but whisperx is what we're currently running in our lab: https://lightning.ai/pages/community/tutorial/deploy-openai-whisper/ https://help.ovhcloud.com/csm/en-public-cloud-ai-deploy-openai-whisper?id=kb_article_view&sysparm_article=KB0061115 https://nlpcloud.com/home/playground/asr https://www.speechly.com/products/hosted-whisper

dahifi commented 5 months ago

As of this morning we've successfully deployed the whisperx-asr-webservice to google cloud. We've got a PR in on the repo that we're using here: https://github.com/DennisTheD/whisper-asr-webservice/pull/2

I created a cloudbuild of the asr repo, then deployed that as a container on a GCE instance. Selecting an image deploy at build via CLI seems to force a docker-optimized image, but we need to use the ML image instead (due to problems running GPU driver install), and deploy the docker image manually after the host is running.

We had to bump the instance up to n1-highmem-4, as the webservice was crashing the docker host during run.

With this compute and GPU, we are looking at around $200/month in costs, but we're running a spot instance for half of that while we prototype.

I've still got to rebuild the image to reflect the changes in our fork, but this should be GETMO. Port 9000 is publicly exposed for testing.

dahifi commented 5 months ago

We created our own fork of the ASR service with some custom changes for CORS, which we would like to PR back into the source repo. We had stored this as a docker in Google Cloud Build for deployment in the AI/ML docker host compute instance.

The only thing we hadn't incorporated into the CI stuff is actually deploying the image. The image build stuff is probably not included in this repo as we haven't properly forked the ASR repo here. See this fork for now: https://github.com/dahifi/whisper-asr-webservice/tree/fork/main