RageAgainstThePixel / com.rest.elevenlabs

A non-official Eleven Labs voice synthesis client for Unity (UPM)
https://elevenlabs.io/?from=partnerbrown9849
MIT License
74 stars 9 forks source link

elevenlabs playback should be same order as text to dequeue, even if next playback url returned may come sooner #46

Closed yosun closed 9 months ago

yosun commented 9 months ago

it seems that your dequeue system for short words will be variable in order. your eleven labs call should check for text queue order before playing next queued

here is a simple string sequence to test "Hello!" "Lorem ipsum this is omg so long testing 123 lol ipsum dolores park" "And then."

Note that "And then." will be returned before "Lorem ipsum..."

tldr; your queue is borked. please fix it.

yosun commented 9 months ago

here's a quick tester

using System.Collections;
using System.Collections.Generic;
using UnityEngine;
using UnityEngine.UI;

public class TestSay : MonoBehaviour
{
    public StreamGPT11 sgpt;

    public InputField input;

    string[] testSeries = new string[]
        {
            "Hello!",
            "Lorem ipsum this is omg so long testing 123 lol ipsum dolores park",
            "And then."
        };

public void SayIt()
    {
        sgpt.ReadIt(input.text);
    }

    private void Update()
    {
        if (Input.GetKeyDown(KeyCode.Space)) Tester();
    }

    private void Tester()
    {
        for (int i = 0; i < testSeries.Length; i++) { print(testSeries[i]); sgpt.ReadIt(testSeries[i]); }
    }
}
    public async void ReadIt(string message)
    {  
        try
        {

            var api = client;

           if (voice == null)
            {
                api.VoicesEndpoint.EnableDebug = DEBUG_MODE;
                voice = (await api.VoicesEndpoint.GetAllVoicesAsync(lifetimeCancellationTokenSource.Token)).FirstOrDefault();
            }

            api.TextToSpeechEndpoint.EnableDebug = DEBUG_MODE;
            var voiceClip = await api.TextToSpeechEndpoint.StreamTextToSpeechAsync(message, voice, partialClip =>
            {
                streamClipQueue.Enqueue(partialClip);

            }, cancellationToken: lifetimeCancellationTokenSource.Token);

          //  audioSource.clip = voiceClip.AudioClip;
             if(DEBUG_MODE)Debug.Log($"Full clip: "+ message + " {voiceClip.Id}");
        }
        catch (Exception e)
        {
            Debug.LogError(e);
        }
    }
StephenHodgson commented 9 months ago

The queue implementation is yours. I just have a simple example in the sample scene.

StephenHodgson commented 9 months ago

If you'd like more help, reach out on discord