Triton Inference Server

SYSTRAN / faster-whisper

Faster Whisper transcription with CTranslate2

MIT License

10.2k stars 859 forks source link

Triton Inference Server #525

Open tomukmatthews opened 8 months ago

tomukmatthews commented 8 months ago

Hi I'd like to deploy faster-whisper using the Triton Inference Server this week, do you have any suggestions around the best approach for doing this? Or is there any work in the pipeline that would make this easier?

Apologies if this is not best placed as an issue - happy to close it if not!

StephennFernandes commented 8 months ago

Hey I'd be interested in your implementation. and would be happy to provide any help necessary. Please keep me posted on the progress

arun2728 commented 7 months ago

@tomukmatthews I also want to deploy faster whisper using the Triton Inference Server. Do you have an update? I would love to collaborate.

AvivNavon commented 6 months ago

Hey, anyone managed to deploy using Triton?

wenestam commented 1 month ago

Hey, reviving this issue.

Has anyone successfully deployed Nvidia-Triton server with faster-whisper? I am looking to do this with AWS Sagemaker. Happy to collaborate if anyone has more insights to share.

Update

Have deployed it to AWS Sagemaker async endpoint using Whisper X. I will try to benchmark it by running several models on the same endpoint.