erew123 / alltalk_tts

AllTalk is based on the Coqui TTS engine, similar to the Coqui_tts extension for Text generation webUI, however supports a variety of advanced features, such as a settings page, low VRAM support, DeepSpeed, narrator, model finetuning, custom models, wav file maintenance. It can also be used with 3rd Party software via JSON calls.
GNU Affero General Public License v3.0
816 stars 91 forks source link

Alltalk Unity Integration? #169

Closed Vander1nde21 closed 4 months ago

Vander1nde21 commented 4 months ago

I'm using UnityLLM: https://github.com/undreamai/LLMUnity but I would like to have text to speech audio generate as well

erew123 commented 4 months ago

Hi @Vander1nde21

Well I don't believe streaming generation works, based on this previous ticket https://github.com/erew123/alltalk_tts/issues/145

At least from that I conclude that Unity wont handle a stream. Though I cant speak to how that code sample shown in there works or is there are better ways to setup streaming.

As for standard generation, that may work. The Unity docs say it should support WAV without problem https://docs.unity3d.com/2020.1/Documentation/Manual/class-AudioClip.html

Though I would suggest:

1) Generate a WAV file with AllTalk and see if you can get it to play within Unity. I believe you should be able to just make a simple script in unity or use their own tools to check if a sample plays ok. This would at least give you confirmation that wavs generated from XTTS models and AllTalk should work with the standard generation API of AllTalk.

2) Assuming that works, the next question would be where are you going to call/send the request to AllTalk from? Unity directly or LLMUnity? And that will answer the question as to how to handle the response and get it played within the unity engine.

I may be able to help you here and there with it, but I would suggest testing out 1 and giving me answers to question 2. Ive never touched Unity in my life.

Ill also point out two other things.

1) I am going to make a small change to the API at some point, though it wouldnt be much of a coding change https://github.com/erew123/alltalk_tts/issues/166 2) Currently there is no request queue system in AllTalk. This I am intending on resolving somewhere in the next release (potentially a couple of weeks away). Though I dont think this will impact you either way, as long as you arent attempting streaming generation.

Thanks

Vander1nde21 commented 4 months ago

Hi, Yeah, the generated TTS wav files work inside of unity. It's just getting it to generate once a reply is completed. Here is a snippet of what is shown on the UnityLLM docs for when a reply has completed.

image

I'm gonna be completely honest, I'm not the best at using C# in unity, I usually get chatGPT to do the code and write a script that would communicate with Alltalk TTS API and unity LLM, but even then nothing in the alltalk console is generating.

here is what chatgpt had generated based off UnityLLM's and Alttalk Documentation:

using System.Collections; using UnityEngine; using UnityEngine.Networking;

public class AllTalkTTSIntegration : MonoBehaviour { private string allTalkUrl = "http://127.0.0.1:7851/api/"; // Update with your AllTalk API endpoint

public IEnumerator TextToSpeech(string text)
{
    // Create form fields for POST request
    WWWForm form = new WWWForm();
    form.AddField("text_input", text);
    form.AddField("text_filtering", "standard");
    form.AddField("character_voice_gen", "female_01.wav"); // Update with desired voice
    form.AddField("language", "en"); // Update with desired language
    form.AddField("output_file_name", "tts_output");

    // Send POST request to AllTalk API
    UnityWebRequest request = UnityWebRequest.Post(allTalkUrl, form);
    yield return request.SendWebRequest();

    if (request.result != UnityWebRequest.Result.Success)
    {
        Debug.LogError("Error sending TTS request: " + request.error);
    }
    else
    {
        // Play TTS audio if request is successful
        string audioFilePath = request.GetResponseHeader("Location");
        StartCoroutine(PlayAudio(audioFilePath));
    }
}

IEnumerator PlayAudio(string audioFilePath)
{
    using (UnityWebRequest www = UnityWebRequestMultimedia.GetAudioClip(audioFilePath, AudioType.WAV))
    {
        yield return www.SendWebRequest();

        if (www.result != UnityWebRequest.Result.Success)
        {
            Debug.LogError("Error downloading TTS audio: " + www.error);
        }
        else
        {
            AudioClip clip = DownloadHandlerAudioClip.GetContent(www);
            AudioSource audioSource = GetComponent<AudioSource>();
            audioSource.clip = clip;
            audioSource.Play();
        }
    }
}

}

and the script that would handle the reply:

using UnityEngine; using System.Collections; using LLMUnity; // Make sure to include the correct namespace for UnityLLM

public class LLMAudioConversion : MonoBehaviour { public LLM llm; public AllTalkTTSIntegration allTalkTTSIntegration;

void Start()
{
    // Start the game
    Game();
}

void Game()
{
    // Your game function
    string message = "Hello bot!";
    // Send message to UnityLLM for processing
    _ = llm.Chat(message, HandleReply);
}

void HandleReply(string reply)
{
    // Handle reply from UnityLLM if needed
    Debug.Log("The AI replied: " + reply);

    // Generate and play TTS for the reply
    StartCoroutine(allTalkTTSIntegration.TextToSpeech("The AI replied: " + reply));
}

// You can remove this method as it's not used
//void ReplyCompleted()
//{
//    // Triggered when the reply from the model is complete
//    Debug.Log("The AI replied");
//}

}

There aren't any errors in the unity console, so i'm thinking it's not communicating with the alltalk API properly since nothing is showing in the alltalk server console.

Thank you

erew123 commented 4 months ago

Hi @Vander1nde21

I believe Ive spotted 3x things.

API Endpoint

Unless its somewhere else and Im missing the variable, you will need tts-generate added to the endpoint name. Otherwise its not actually hitting the correct endpoint:

private string allTalkUrl = "http://127.0.0.1:7851/api/tts-generate"; // Update with your AllTalk API endpoint

Missing Values

You will have to send over the other bits to the API. (I probably should update the API so that if all the values arent sent, they select a default value, however currently, even if they arent used, it wants them all).

    form.AddField("text_input", text);
    form.AddField("text_filtering", "standard");
    form.AddField("character_voice_gen", "female_01.wav"); // Update with desired voice
    form.AddField("language", "en"); // Update with desired language
    form.AddField("output_file_name", "tts_output");

    //Not used by you currently, but required for the API as it checks these variables so you can just leave them at any values
    form.AddField("narrator_enabled", "false");
    form.AddField("narrator_voice_gen", "male01.wav");
    form.AddField("text_not_inside", "character");
    form.AddField("output_file_timestamp", "false");
    form.AddField("autoplay", "true");
    form.AddField("autoplay_volume", "0.1");

You may want to update AllTalk to the last update from this weekend, If it gets something sent to the API, Ive also set it to put out a clear message at the terminal/console telling you what is missing/wrong. So that may help you debug further.

image

Returned file

Finally, this looks wrong to me:

    {
        // Play TTS audio if request is successful
        string audioFilePath = request.GetResponseHeader("Location");
        StartCoroutine(PlayAudio(audioFilePath));
    }

If you look at the things AllTalk sends back https://github.com/erew123/alltalk_tts?tab=readme-ov-file#-tts-generation-response you can use output_file_path, output_file_url or output_cache_url and I dont see those referenced in the Unity script. So this is me assuming, but you may want (if you wanted to pull the wav from the disk location):

    {
        // Play TTS audio if request is successful
        string audioFilePath = request.GetResponseHeader("output_file_path");
        StartCoroutine(PlayAudio(audioFilePath));
    }

That aside, I'm no Unity or C# developer, but in principle the code looks about correct!

Hope that gives you a direction to head in.

erew123 commented 4 months ago

Mind you, saying what I just said. To make it more flexible for future when I update the API, I would be tempted to do this:

private string allTalkEndpoint = "/api/tts-generate"; // Location on the API Endpoint
private string allTalkIPPort = "127.0.0.1:7851"; // You can later have somewhere in the interface to change this if needed
private string allTalkUrl = "http://" + allTalkIPPort + allTalkEndpoint ; // Full path to AllTalk API endpoint