BerriAI / litellm

Python SDK, Proxy Server (LLM Gateway) to call 100+ LLM APIs in OpenAI format - [Bedrock, Azure, OpenAI, VertexAI, Cohere, Anthropic, Sagemaker, HuggingFace, Replicate, Groq]
https://docs.litellm.ai/docs/
Other
14.27k stars 1.68k forks source link

[Feature]: Support for Vertex Chirp Model #6218

Open emerzon opened 1 month ago

emerzon commented 1 month ago

The Feature

Chirp is a speech to text model, similar to whisper

Ideally it could be supported via the existing API methods for /v1/audio/transcriptions already existing for whisper for seamless integration

Motivation, pitch

This is a desired feature to enable seamless speech-to-text model integrations using OpenAI interface.

Twitter / LinkedIn details

No response

krrishdholakia commented 1 month ago

Available API methods Chirp processes speech in much larger chunks than other models do. This means it might not be suitable for true, real-time use. Chirp is available through the following API methods:

v2 Speech.Recognize (good for short audio < 1 min) v2 Speech.BatchRecognize (good for long audio 1 min to 8 hrs)

krrishdholakia commented 1 month ago

Model Model identifier Language support V2 Speech.StreamingRecognize (good for streaming and real-time audio) Limited V2 Speech.Recognize (good for short audio < 1 min) On par with Chirp V2 Speech.BatchRecognize (good for long audio 1 min to 8 hrs) On par with Chirp You can always find the latest list of supported languages and features for each transcription model, using the locations API.


https://cloud.google.com/speech-to-text/v2/docs/chirp_2-model

krrishdholakia commented 1 month ago

Speech To Text

Text to Speech