Open aaa3334 opened 5 days ago
Yeah, so you can run serverless on runpod like this:
However, this doesn't do dynamic batching, it spins up a new worker for each request. I haven't found an open source library yet that easily allows dynamic batching.
But if you're anyway using small batches, this will do what you need.
the other issues is that there is a defaul tmodel set, and it's not the turbo model :( so you also have to figure out how to swap that (which may require either adding env variables OR possibly rebuilding the docker).
I'm going to do a video soon on setting up endpoints, I'll see if I can do something on serverless, depends how much work/digging it takes.
Thanks for your reply! I tried the default FasterWhisper template, but as you mentioned, it doesn't have the turbo model. Was looking at trying to rebuild using the hugging face https://huggingface.co/mobiuslabsgmbh/faster-whisper-large-v3-turbo/tree/main but am so unsure about what the settings for that would look like to get it set up - I feel the RunPod team says they make it easy, but their documentation seems to be at the level of someone already used to setting up VM's etc. - I am familiar with hugging face, digital ocean etc. but am no expert and only got those set up by following guides. (Docker I am very familiar with though but does not always seem to be the best solution for endpoints (eg on hugging face you have gradio which is much easier and more lightweight than setting up a full docker container which feels like overkill for one endpoint).
That would be really cool! For me right now the serverless ones seem like the way to go (else I feel I could just use Hugging faces interface which I already know the setup for etc.) Its really cool to see all these different ways to do things more easily and am so happy I ran into RunPod on your channel :)
this is good feedback, I'll dig in on whether it's possible or easy to port over something serverless, ronan
On Tue, Nov 12, 2024 at 6:23 PM aaa3334 @.***> wrote:
Thanks for your reply! I tried the default FasterWhisper template, but as you mentioned, it doesn't have the turbo model. Was looking at trying to rebuild using the hugging face https://huggingface.co/mobiuslabsgmbh/faster-whisper-large-v3-turbo/tree/main but am so unsure about what the settings for that would look like to get it set up - I feel the RunPod team says they make it easy, but their documentation seems to be at the level of someone already used to setting up VM's etc. - I am familiar with hugging face, digital ocean etc. but am no expert and only got those set up by following guides. (Docker I am very familiar with though but does not always seem to be the best solution for endpoints (eg on hugging face you have gradio which is much easier and more lightweight than setting up a full docker container which feels like overkill for one endpoint).
That would be really cool! For me right now the serverless ones seem like the way to go (else I feel I could just use Hugging faces interface which I already know the setup for etc.) Its really cool to see all these different ways to do things more easily and am so happy I ran into RunPod on your channel :)
— Reply to this email directly, view it on GitHub https://github.com/TrelisResearch/one-click-llms/issues/10#issuecomment-2472221393, or unsubscribe https://github.com/notifications/unsubscribe-auth/ASVG6CVIONHHZDXSUO4FRR32AKZ3JAVCNFSM6AAAAABRS54YHCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDINZSGIZDCMZZGM . You are receiving this because you commented.Message ID: @.***>
Hi!
I happily stumbled into your video on faster-whisper and learnt runpod is a thing and that they have serverless. I am wondering if you have a guide or template on how to set up faster whisper serverless? Or if it is the same as eg the Ministral one you set up?