(Feature): Natural voice playback using tts for edge

Iheuzio commented 11 months ago

Hi,

I've recently saw the project here and it is using the services directly in windows. However the natural voices do sound much better, but are locked from the normal api in windows. They could be accessed with the edge/chrome api through chrome.ttsEngine or chrome.tts.

The added benefit is you do not need windows to run this and you can relay the calls to windows, or possibly whatever text reader is configured for the browser. There is the github project here, the work is already done there and you can simply pass the command from unity into the edge-tts command from the program there.

If this sounds reasonable, I could try working on a possible integration.

Osmodium commented 11 months ago

Hi! Thanks for your interest. There are many who have suggested using some other library/framework for the speech generation (like Bark). You are correct that it uses Windows directly (if on windows) and it uses the OSX build-in tts feature when on OSX.

I will look into edge-tts, since it has a feature to have the audio play back immediately. I'd also rather not ship python, require the user to have python installed or have to install 3rd party packages. Thanks for your suggestion.

Osmodium commented 11 months ago

I'm just realizing; are you talking about these voices? https://www.ghacks.net/2018/08/11/unlock-all-windows-10-tts-voices-system-wide-to-get-more-of-them/ Because then the mod already supports the voices, you "just" have to enable them through the guide (also described in the only article on the mod page)

Iheuzio commented 11 months ago

It should read whichever tts voice is configured in chrome/chromium. This is separate from windows' voices and would require to pass the text through the browser API in order to be read.

Osmodium commented 11 months ago

I think you might mean this? https://learn.microsoft.com/en-us/archive/msdn-magazine/2019/june/speech-text-to-speech-synthesis-in-net It was not possible to use SpeechSynthesis when the project was created since the version of .Net was incompatible with the version used in Unity, so the sideloading did not work. It might work now, however I have not tested this yet.

Couple of questions if not:

Would this require the user to have chrome installed?
Would it only be able to use the voice configured in chrome and not be able to take an agument for the voice?
Does it require an internet connection?

Iheuzio commented 8 months ago

Hi, sorry for the delay in response. Was busy working on other stuff.

You could check the forked repository I made for testing the changes, it works well. here

it requires ffmpeg, mpv and edge-tts to be installed. Then it would work like this: View Video demo

As for your questions:

Would this require the user to have chrome installed?

Yes you must have a chromium browser (edge and chrome are supported), only windows would work with this feature.
Would it only be able to use the voice configured in chrome and not be able to take an agument for the voice?

You can pass any one of the models after running edge-tts --list-voices, you can configure rate, pitch and all that as well: see documentation
Does it require an internet connection?

No internet connection is needed

Iheuzio commented 8 months ago

If you want to communicate on discord I'm in the server @iheuzio.

I could integrate a change where there is an option in the menu that allows this option to be toggled on or off. As well as a script to setup the mpv, ffmpeg and all that on windows to make it easier for installation. Let me know if that sounds good.

Osmodium commented 8 months ago

Thanks for providing a PoC of it, and no worries about taking time :) I haven't had time to look at the fork yet but, it looks a sounds good. I have some requests/concers: It sounds like it takes a while for the audio to play after clicking, can this be reduced? I imagine needing to have some checks as to if the computer has the applications installed and the correct supported versions of them, have you included this?

Iheuzio commented 8 months ago

It sounds like it takes a while for the audio to play after clicking, can this be reduced?

Yes, the longer the text, the longer it takes to save a temporary mp3 file, we could either split up the mp3 files into many smaller ones and play them separately as the others finish up sequentially.
I imagine needing to have some checks as to if the computer has the applications installed and the correct supported versions of them, have you included this?

I did not test that, this was done just to show that using edge-tts was possible. I can work on an actual integration later, however my code was written in 30 minutes so it's pretty bad.

adamstradomski commented 7 months ago

Is this idea stopped? I would love to see it work. Especially with Rouge Trader :)

But I believe the edge-tts does require internet connection. It's a python wrapper over Bing API?

Iheuzio commented 7 months ago

Yeah, it requires internet to be connected for natural voices. My bad, however I'll be able to work on a simple proper integration with the existing tts possibly after january 12th.

Osmodium commented 7 months ago

@Iheuzio Saw your video on the owl-cat forum. Looks promising with a bit of delay still 👍 Couldn't contact you on Discord since we aren't friends there. Also: It seems like the service is no always up, which might create confusion if the mod switches after a delay to the "standard" voices. It seems a bit unstable to me still. Let me know if you want me to take a look/help with it. Tried this link: https://speech.platform.bing.com/consumer/speech/synthesize/readaloud/readaloud/voices/list?trustedclienttoken=6A5AA1D4EAFF4E9FB37E23D68491D6F4 However it would be cool to have this as a toggle, so if people wanted to use it, they could with the caveats it might have :)

Osmodium commented 7 months ago

I just dabbled a bit with this in LinqPad, and I got it to work without having the python program installed, which would probably be preferred?

Wazard commented 6 months ago

hey, are you still working on that idea? I do have experience in coding wiht C# and unity but none in tts so i don't know if I would be of any help

Osmodium commented 6 months ago

@Wazard Yes, and I've gotten it to work, but discovered that the service that is being used to generate the audio does not support multiple voices in one request. So I'm currently working on parallelizing calls for each section of the dialog.

Wazard commented 6 months ago

@Osmodium sorry for the probably dumb question, but: I saw that from the narrator you can add and download the natural voices. Couldn't be possible to use em within windows itself in this way?

Iheuzio commented 6 months ago

Hi, currently I won't have time to focus on this mod. I'm working on other stuff at the moment, I may be able to try helping out later however not for the meantime.

bubval commented 5 months ago

Hey @Osmodium it's awesome ot hear that you have it working. Would it be possible to have a release not including multiple voices? How's the progress so far?

Osmodium commented 4 months ago

Hi! I have just uploaded the experimental version of the mod (0.9.4-EXP) here which includes Natural Voices through the Bing service. It is the version from over a month ago, but progress has been slow. It is still WIP so there might be bugs.

BelegCufea commented 2 months ago

OK. I have found NaturlaVoiceSAPIAdapter repo on GitHub that enables us to use Natural voices (including those for Edge) with TTS.

But it needs slight adjustment in SpeechMod to function. Just five new lines in GetAvailableVoices() method in WindowsVoiceUnity.cs.

Put these lines:

if (voices[i].Contains("(Natural)"))
{
    voices[i] = voices[i].Replace("(Natural)", "");
    voices[i] = voices[i].Replace("(", "Natural (");
}

just under

if (!voices[i].Contains('-'))
    voices[i] = $"{voices[i]}#Unknown";
else
    voices[i] = voices[i].Replace(" - ", "#");

Whole method should look like this:

public static string[] GetAvailableVoices()
{
    string voicesDelim = getVoicesAvailable();
    if (string.IsNullOrWhiteSpace(voicesDelim))
        return Array.Empty<string>();
    string[] voices = voicesDelim.Split(new[] { '\n' }, StringSplitOptions.RemoveEmptyEntries);
    for (int i = 0; i < voices.Length; ++i)
    {
        if (!voices[i].Contains('-'))
            voices[i] = $"{voices[i]}#Unknown";
        else
            voices[i] = voices[i].Replace(" - ", "#");
        if (voices[i].Contains("(Natural)"))
        {
            voices[i] = voices[i].Replace("(Natural)", "");
            voices[i] = voices[i].Replace("(", "Natural (");
        }
    }
    return voices;
}

You will get something like thise:

Only tried if for few dialogs and books, but it seems to work just fine.

Osmodium commented 2 months ago

@BelegCufea That looks awesome and interesting! It looks like it only works for Windows 11, or I might be missing something.

BelegCufea commented 2 months ago

@Osmodium I have no idea, but on the Repo page there is a mention about Win 10 in System Requirements section:

I'm using Windows 10. Can I use the Narrator natural voices on Windows 11?

Yes, as long as your Windows 10 build number is 17763 or above (version 1809). You can choose and install Windows 11 Narrator voices here.

Windows 10's Narrator doesn't support natural voices directly, but it does support SAPI 5 voices. So you can make Windows 11 Narrator voices work on Windows 10 via this engine.

Osmodium commented 2 months ago

@BelegCufea I will test this out, since I'm on a Windows 10. Thanks!

Osmodium commented 2 months ago

It seems to be working pretty well, apart from it crashing when issued to stop. I have to look into this, but otherwise this is a pretty elegant solution for those who want those voices, and possibly others too.

BelegCufea commented 2 months ago

@Osmodium Nice. Works fine on Win 11. No crashing at all. Even when interrupting dialogs.

Sometimes there is a slight delay when using Edge voices, but not always. And even when there is one, it is acceptable (me thinks :-) )

And it seems it has some problem with <silence/> tag. Had to change phonetics like so:

  "—": " . . ",
  "...": " . . . ",

BelegCufea commented 2 months ago

@Osmodium Update after a few more hours of play.

Occasionally (about 1% of the time), the game becomes unresponsive when starting a new "page" of dialogue.
I always wait for the dialogue to be fully read before proceeding, so there is no interrupting.
This only happens when using Edge voices.
I can't determine where the freeze is occurring: in SAPIAdapter, in the WindowsVoice DLL, or in the C# wrapper.
As far as I can tell, this has never happened to me with offline Natural voices.

LeapSoftware commented 2 months ago

Hey All, just thought id add my own experience to the above mentioned ^.

I cloned the repo and made the changes you mentioned above (plus using NaturlaVoiceSAPIAdapter). I was able to get it working and detecting all online and locally downloaded natural voices. After using for a bit i can confirm that using online voices seems to every so often (far more often than 1%) hang the game indefinitely. I have not delved into where exactly it is throwing (i cant see any errors) but there is definitely an issue there.

If I find any other issues ill post here :)

BelegCufea commented 2 months ago

@LeapSoftware Thanks for info.

Unfortunately, that is true. It is unstable when using online voices. Nevertheless, I have had no problems so far using offline natural voices.

For anyone interested, I compiled the changes at my fork. It is highly experimental though! Use at your own risk :-)

Wazard commented 2 months ago

@LeapSoftware Thanks for info.

Unfortunately, that is true. It is unstable when using online voices. Nevertheless, I have had no problems so far using offline natural voices.

For anyone interested, I compiled the changes at my fork. It is highly experimental though! Use at your own risk :-)

how do i make it to work? The only 2 available are Zira and David. I'm on win11 with natural voices installed

Osmodium commented 2 months ago

I'll add the code to the project and I guess I can do a small writeup about having to use NaturalVoiceSAPIAdapter to make it work.

Christian-Arning commented 1 month ago

@LeapSoftware Thanks for info. Unfortunately, that is true. It is unstable when using online voices. Nevertheless, I have had no problems so far using offline natural voices. For anyone interested, I compiled the changes at my fork. It is highly experimental though! Use at your own risk :-)

how do i make it to work? The only 2 available are Zira and David. I'm on win11 with natural voices installed

You have to install the Natural Voice SAPI Adapter to make it show up.

Osmodium / PathfinderTextToSpeechMod

(Feature): Natural voice playback using tts for edge #25