IAHispano / Applio

A simple, high-quality voice conversion tool focused on ease of use and performance.
https://applio.org
MIT License
1.67k stars 271 forks source link

[Bug]: Long texts cant be used for TTS #807

Open Illaren opened 7 hours ago

Illaren commented 7 hours ago

Project Version

3.2.6

Platform and OS Version

Windows 11 Pro; Firefox

Affected Devices

Windows 11 Pro; Firefox 131.0.2

Existing Issues

No response

What happened?

Traceback (most recent call last): File "C:\Users\babu2\Downloads\Applio-3.2.6\env\lib\site-packages\gradio\queueing.py", line 536, in process_events response = await route_utils.call_process_api( File "C:\Users\babu2\Downloads\Applio-3.2.6\env\lib\site-packages\gradio\route_utils.py", line 321, in call_process_api output = await app.get_blocks().process_api( File "C:\Users\babu2\Downloads\Applio-3.2.6\env\lib\site-packages\gradio\blocks.py", line 1935, in process_api result = await self.call_function( File "C:\Users\babu2\Downloads\Applio-3.2.6\env\lib\site-packages\gradio\blocks.py", line 1520, in call_function prediction = await anyio.to_thread.run_sync( # type: ignore File "C:\Users\babu2\Downloads\Applio-3.2.6\env\lib\site-packages\anyio\to_thread.py", line 56, in run_sync return await get_async_backend().run_sync_in_worker_thread( File "C:\Users\babu2\Downloads\Applio-3.2.6\env\lib\site-packages\anyio_backends_asyncio.py", line 2405, in run_sync_in_worker_thread return await future File "C:\Users\babu2\Downloads\Applio-3.2.6\env\lib\site-packages\anyio_backends_asyncio.py", line 914, in run result = context.run(func, args) File "C:\Users\babu2\Downloads\Applio-3.2.6\env\lib\site-packages\gradio\utils.py", line 826, in wrapper response = f(args, *kwargs) File "C:\Users\babu2\Downloads\Applio-3.2.6\core.py", line 383, in run_tts_script subprocess.run(command_tts) File "C:\Users\babu2\Downloads\Applio-3.2.6\env\lib\subprocess.py", line 505, in run with Popen(popenargs, **kwargs) as process: File "C:\Users\babu2\Downloads\Applio-3.2.6\env\lib\subprocess.py", line 951, in init self._execute_child(args, executable, preexec_fn, close_fds, File "C:\Users\babu2\Downloads\Applio-3.2.6\env\lib\subprocess.py", line 1436, in _execute_child hp, ht, pid, tid = _winapi.CreateProcess(executable, args, FileNotFoundError: [WinError 206] The filename or extension is too long

Steps to reproduce

Use a big text for TTS.

Even though i have

"Split the audio into chunks for inference to obtain better results in some cases. Split Audio"
enabled it does not split the long text into handable text chunks.

If i try to covert the audiobook chapter by chapter it is woprking but if i put in the complete audiobook it does hand out the error in the "What happend" part.

Computer\HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\FileSystem and edit LongPathsEnabled is already set to 1

Expected behavior

Suggested solution:

If a long text is used as an input split it up into handable chunks fist before the augment is given to the converter since it seems the the file system in windows can not handle more than ~~32.000 characters

Attachments

No response

Screenshots or Videos

No response

Additional Information

No response

AznamirWoW commented 6 hours ago

Please explain how are you running TTS?

Illaren commented 6 hours ago

Starting run-applio.bat => clicking on TTS in the webbrowser window => "File to Speech" => uploading my textfile => Advanced Settings => activate "Split Audio" => Convert

Works like a charm if i only have one Chapter of the book in the txt file; if i have the complete book in the txt file it gives the error.


Same problem if i copy and paste the text into the "Text to Speech"

AznamirWoW commented 6 hours ago

Yeah, since the text is passed as a parameter in the command line, it can not be that long. I can only suggest using a standalone script like this

import asyncio
import edge_tts
import rvc.lib.zluda
from rvc.infer.infer import VoiceConverter

#tts settings
input_text = "test_edge_tts.txt"
text = ""
speaker = "en-GB-RyanNeural"
rate = 0
#infer settings
pth_path = r"G:\ApplioV3.2.6\logs\model\model.pth"
index_path = r"G:\ApplioV3.2.6\logs\model\model.index"
input_path = r"F:\TTS_OUT\tts_out.wav"
output_path = r"F:\TTS_OUT\infer_out.wav"

async def main():
    rates = f"+{rate}%" if rate >= 0 else f"{rate}%"
    start_time1 = time.time()
    await edge_tts.Communicate(text,speaker,rate=rates,).save(input_path)
    elapsed_time = time.time() - start_time1
    print(f"TTS gen time in {elapsed_time:.2f} seconds.")

if __name__ == "__main__":

    with open(input_text, 'r') as file:
        text = file.read()

    asyncio.run(main())

    start_time1 = time.time()
    infer_pipeline = VoiceConverter()
    infer_pipeline.convert_audio(audio_input_path=input_path,audio_output_path=output_path,model_path=pth_path,index_path=index_path,split_audio=True)
    elapsed_time = time.time() - start_time1
    print(f"Inference time in {elapsed_time:.2f} seconds.")
Illaren commented 5 hours ago

Sorry i am a bit to stupid to work with this as it seems.

I can understand that " with open(input_text, 'r') as file: text = file.read() "

means that the txt file itself is given over to the converter but i am not sure how to implement this (not a programmer but at least i can read at bit english; just a dumb user here)

AznamirWoW commented 4 hours ago

you save this script as run_tts.py and run it from applio folder using env\python run_tts.py

The script reads "test_edge_tts.txt" file with the text content to convert using "en-GB-RyanNeural" speaker (you can change it to some other you can find on Applio TTS screen.

pth_path = r"G:\ApplioV3.2.6\logs\model\model.pth" index_path = r"G:\ApplioV3.2.6\logs\model\model.index" input_path = r"F:\TTS_OUT\tts_out.wav" output_path = r"F:\TTS_OUT\infer_out.wav"

you need to change these and they are 1) path to the model.pth 2) path to the model's index file 3) temporary file to save the tts output 4) final file after conversion