TheWiselyBearded commented 9 months ago

Bug Report

Overview

Whenever sending a TextToSpeech request, regardless of including Voice Settings, the voice settings are reset. This issue persists for all cloned voices. I've attached a before/after picture to demonstrate this, along with the Unity log before sending the request. In the attached screenshots, the bottom-most picture is before sending the web request, the middle picture is after, and the top-most Unity log screenshot shows OG voice settings before being overridden.

To Reproduce

Steps to reproduce the behavior:

Configure Voice Settings in Eleven Labs Dashboard.
Send a TexttoSpeech request for a specific cloned voice
Return to Eleven Labs Dashboard, refresh, and you'll notice the voice settings are reconfigured.

Expected behavior

The returning audio clip should be generated using the pre-configured voice settings for a specific voice clone.

Screenshots

UnityLog

Additional context

This bug has overridden all my tested cloned voices. If I can be of help, given some direction for investigating deeper, I'm happy to help further debug the issue.

TheWiselyBearded commented 9 months ago

After further review, I've noticed that the first clip retains the voice settings then all proceeding clips are reverted to original/default voice settings. I've attached my code snippet to aid in explaining.

I am trying to send multiple requests per sentence to workaround the lack of reliable AudioStreaming (I've followed yours/community posts with Unity forum on this matter). I've tried batching the requests as multiple tasks and I've tried sending one request at a time in a for loop. Lastly, I've limited my TTS requests to a total of 4 sentences per set of voice requests.

I added the edit request last minute to ensure voice settings are being set correctly before making voice requests.

TheWiselyBearded commented 9 months ago

I've found the issue. It looks like the TextToSpeech request defaulted to the English v1 model. Further, the static models defined in Model.cs exclude multilingual v2. As a result, I was getting unexpected voice clips because I never personally configured voice settings for those models. To fix it, I simply defined the multilingual v2 model as follows: ElevenLabs.Models.Model m = new ElevenLabs.Models.Model("eleven_multilingual_v2"); Then passed that into my TextToSpeech request.

RageAgainstThePixel / com.rest.elevenlabs

Voice Settings Overridden #32

Bug Report

Overview

To Reproduce

Expected behavior

Screenshots

Additional context