[Feature]: Google Text-to-Speech support ssml

BerriAI / litellm

Python SDK, Proxy Server (LLM Gateway) to call 100+ LLM APIs in OpenAI format - [Bedrock, Azure, OpenAI, VertexAI, Cohere, Anthropic, Sagemaker, HuggingFace, Replicate, Groq]

https://docs.litellm.ai/docs/

Other

13.77k stars 1.62k forks source link

[Feature]: Google Text-to-Speech support ssml #5400

Closed aMediocreDad closed 2 months ago

aMediocreDad commented 2 months ago

The Feature

Implementation of Google Text-to-Speech is missing support for ssml.

The Google text synthesis API, accepts ssml OR text as input: https://cloud.google.com/text-to-speech/docs/reference/rest/v1/SynthesisInput

Motivation, pitch

I am happy to see this being added https://github.com/BerriAI/litellm/pull/5346. However, it is still not usable to me as I am using ssml and the implementation assumes that input is text.

I am not sure how this could best be implemented to adhere to the OpenAI API, however. Maybe a boolean as extra_body?

Twitter / LinkedIn details

No response

ishaan-jaff commented 2 months ago

PR here: https://github.com/BerriAI/litellm/pull/5415

ishaan-jaff commented 2 months ago

Actually you don't even need a boolean @aMediocreDad - we can detect if it's ssml if it contains <speak>

import openai

client = openai.OpenAI(api_key="sk-1234", base_url="http://0.0.0.0:4000")

ssml = """
<speak>
    <p>Hello, world!</p>
    <p>This is a test of the <break strength="medium" /> text-to-speech API.</p>
</speak>
"""

# see supported values for "voice" on vertex here: 
# https://console.cloud.google.com/vertex-ai/generative/speech/text-to-speech
response = client.audio.speech.create(
    model = "vertex-tts",
    input=ssml, # pass as None since OpenAI SDK requires this param
    voice={'languageCode': 'en-US', 'name': 'en-US-Studio-O'},
)
print("response from proxy", response)

ishaan-jaff commented 2 months ago

We also do support the boolean use_ssml if you want to force it to use ssml

aMediocreDad commented 2 months ago

Thank you 🫶 That's a great solution! Looking forward to removing the need for the tts client:D

ishaan-jaff commented 2 months ago

@aMediocreDad thanks for using LiteLLM, can we hop on a call sometime this week / next ? Hoping to learn how we can improve litellm for you

My cal for your convenience https://calendly.com/d/4mp-gd3-k5k/berriai-1-1-onboarding-litellm-hosted-version?month=2023-10 My linkedin if you prefer DM: https://www.linkedin.com/in/reffajnaahsi/

ishaan-jaff commented 1 month ago

Hi @aMediocreDad, curious do you use LiteLLM Proxy in production today If so, I'd love to hop on a call and learn how we can improve LiteLLM for you

my cal for your convenience: https://calendly.com/d/4mp-gd3-k5k/litellm-1-1-onboarding-chat
my linkedin if you prefer DMs: https://www.linkedin.com/in/reffajnaahsi/