elevenlabs / elevenlabs-python

The official Python API for ElevenLabs Text to Speech.
https://elevenlabs.io/docs/api-reference/getting-started
MIT License
2.1k stars 239 forks source link

confusion over the API and model ID options #124

Open dskill opened 11 months ago

dskill commented 11 months ago

I'm trying to use the elevenlabs python library with stream(), and it works fine with eleven_monolingual_v1 but fails with eleven_monolingual_v2. However I can't find anything in the documentation that clarifies what models are available in streaming mode.

mushanwei commented 11 months ago

same issue, did you solve it

dskill commented 11 months ago

Nope :(. Would love to know if streaming supports v2.

lhyphendixon commented 9 months ago

Dudes, for v2 the id is eleven_multilingual_v2 . It's quite frustrating that that information is NOWHERE to be found and I literally had to guess.

Joshua-Shepherd commented 6 months ago

It's incredibly frustrating and unorganized.. You'd think that with all the money they're making, they could afford to allocate someone to fix their Docs and professionally maintain this package. I get that its a new company and all, but cmon guys its been like 2 years. I swear I'm not trying to be difficult, it just really looks bad. And feels bad for me as a developer.

Anyway, if you're looking for the models I'd recommend just using this: https://elevenlabs.io/docs/api-reference/get-models Then copy paste them to a new JSON file or something.

Anyway, there is no 'eleven_monolingual_v2'. here are all the models as of now:

[
    {
        "model_id": "eleven_multilingual_v2",
        "name": "Eleven Multilingual v2",
        "can_be_finetuned": true,
        "can_do_text_to_speech": true,
        "can_do_voice_conversion": false,
        "can_use_style": true,
        "can_use_speaker_boost": true,
        "serves_pro_voices": false,
        "token_cost_factor": 1,
        "description": "Our state of the art multilingual speech synthesis model, able to generate life-like speech in 29 languages.",
        "requires_alpha_access": false,
        "max_characters_request_free_user": 2500,
        "max_characters_request_subscribed_user": 5000
    },
    {
        "model_id": "eleven_multilingual_v1",
        "name": "Eleven Multilingual v1",
        "can_be_finetuned": true,
        "can_do_text_to_speech": true,
        "can_do_voice_conversion": false,
        "can_use_style": false,
        "can_use_speaker_boost": false,
        "serves_pro_voices": false,
        "token_cost_factor": 1,
        "description": "Generate lifelike speech in multiple languages and create content that resonates with a broader audience.",
        "requires_alpha_access": false,
        "max_characters_request_free_user": 2500,
        "max_characters_request_subscribed_user": 5000
    },
    {
        "model_id": "eleven_monolingual_v1",
        "name": "Eleven English v1",
        "can_be_finetuned": true,
        "can_do_text_to_speech": true,
        "can_do_voice_conversion": false,
        "can_use_style": false,
        "can_use_speaker_boost": false,
        "serves_pro_voices": false,
        "token_cost_factor": 1,
        "description": "Use our standard English language model to generate speech in a variety of voices, styles and moods.",
        "requires_alpha_access": false,
        "max_characters_request_free_user": 2500,
        "max_characters_request_subscribed_user": 5000,
        "languages": [
            {
                "language_id": "en",
                "name": "English"
            }
        ]
    },
    {
        "model_id": "eleven_turbo_v2",
        "name": "Eleven Turbo v2",
        "can_be_finetuned": false,
        "can_do_text_to_speech": true,
        "can_do_voice_conversion": false,
        "can_use_style": false,
        "can_use_speaker_boost": false,
        "serves_pro_voices": false,
        "token_cost_factor": 1,
        "description": "Our cutting-edge turbo model is ideally suited for tasks demanding extremely low latency.",
        "requires_alpha_access": false,
        "max_characters_request_free_user": 2500,
        "max_characters_request_subscribed_user": 5000,
        "languages": [
            {
                "language_id": "en",
                "name": "English"
            }
        ]
    },
    {
        "model_id": "eleven_multilingual_sts_v2",
        "name": "Eleven Multilingual v2",
        "can_be_finetuned": true,
        "can_do_text_to_speech": false,
        "can_do_voice_conversion": true,
        "can_use_style": true,
        "can_use_speaker_boost": true,
        "serves_pro_voices": false,
        "token_cost_factor": 1,
        "description": "Our cutting-edge, multilingual speech-to-speech model is designed for situations that demand unparalleled control over both the content and the prosody of the generated speech across various languages.",
        "requires_alpha_access": false,
        "max_characters_request_free_user": 2500,
        "max_characters_request_subscribed_user": 5000,
    },
    {
        "model_id": "eleven_english_sts_v2",
        "name": "Eleven English v2",
        "can_be_finetuned": true,
        "can_do_text_to_speech": false,
        "can_do_voice_conversion": true,
        "can_use_style": true,
        "can_use_speaker_boost": true,
        "serves_pro_voices": false,
        "token_cost_factor": 1,
        "description": "Our state-of-the-art speech to speech model suitable for scenarios where you need maximum control over the content and prosody of your generations.",
        "requires_alpha_access": false,
        "max_characters_request_free_user": 2500,
        "max_characters_request_subscribed_user": 5000,
        "languages": [
            {
                "language_id": "en",
                "name": "English"
            }
        ]
    }
]