elevenlabs / elevenlabs-docs

Documentation for elevenlabs.io/docs
https://elevenlabs.io/docs
55 stars 244 forks source link

Audio artifacts when using Request Stitching #272

Open brandburner opened 3 months ago

brandburner commented 3 months ago

When using this function (based on the sample code) to generate lines using request stitching (conditioned on previous/next text and previous generations), often a generation will end with a tiny fragment (aspirated sound?) from the start of the following line. This isn't noticable if the lines are stitched back-to-back, but it can be a VERY noticable glitch if any extra gaps are inserted between each line.

def convert_text_to_mp3(text, voice, character_name, base_style, previous_request_ids, previous_text, next_text):
    cache_key = hashlib.md5(text.encode()).hexdigest()
    character_dir = os.path.join(".eleven_labs_cache", character_name)
    os.makedirs(character_dir, exist_ok=True)
    cache_file = os.path.join(character_dir, f"{cache_key}.mp3")

    if not os.path.exists(cache_file):
        randomized_style = randomize_style(base_style)
        updated_voice = Voice(
            voice_id=voice.voice_id,
            settings=VoiceSettings(
                stability=voice.settings.stability,
                similarity_boost=voice.settings.similarity_boost,
                style=randomized_style,
                use_speaker_boost=voice.settings.use_speaker_boost
            )
        )

        response = requests.post(
            f"https://api.elevenlabs.io/v1/text-to-speech/{updated_voice.voice_id}/stream",
            json={
                "text": text,
                "model_id": "eleven_multilingual_v2",
                "previous_request_ids": previous_request_ids[-3:],
                "previous_text": previous_text,
                "next_text": next_text
            },
            headers={"xi-api-key": os.environ['ELEVEN_LABS_API_KEY']}
        )

        if response.status_code != 200:
            logging.error(f"Error encountered, status: {response.status_code}, content: {response.text}")
            return None, text, 0

        with open(cache_file, 'wb') as f:
            f.write(response.content)

    return cache_file, text, duration

The glitch occurs at the very end of generations - I believe it's actually the first sound of the following line.