Open emerzon opened 1 month ago
Available API methods Chirp processes speech in much larger chunks than other models do. This means it might not be suitable for true, real-time use. Chirp is available through the following API methods:
v2 Speech.Recognize (good for short audio < 1 min) v2 Speech.BatchRecognize (good for long audio 1 min to 8 hrs)
Model Model identifier Language support V2 Speech.StreamingRecognize (good for streaming and real-time audio) Limited V2 Speech.Recognize (good for short audio < 1 min) On par with Chirp V2 Speech.BatchRecognize (good for long audio 1 min to 8 hrs) On par with Chirp You can always find the latest list of supported languages and features for each transcription model, using the locations API.
https://cloud.google.com/speech-to-text/v2/docs/chirp_2-model
The Feature
Chirp is a speech to text model, similar to
whisper
Ideally it could be supported via the existing API methods for
/v1/audio/transcriptions
already existing forwhisper
for seamless integrationMotivation, pitch
This is a desired feature to enable seamless speech-to-text model integrations using OpenAI interface.
Twitter / LinkedIn details
No response