erew123 / alltalk_tts

AllTalk is based on the Coqui TTS engine, similar to the Coqui_tts extension for Text generation webUI, however supports a variety of advanced features, such as a settings page, low VRAM support, DeepSpeed, narrator, model finetuning, custom models, wav file maintenance. It can also be used with 3rd Party software via JSON calls.
GNU Affero General Public License v3.0
816 stars 91 forks source link

Exception on sample_rate #176

Closed Tetragramm closed 4 months ago

Tetragramm commented 4 months ago

🔴 If you have installed AllTalk in a custom Python environment, I will only be able to provide limited assistance/support. AllTalk draws on a variety of scripts and libraries that are not written or managed by myself, and they may fail, error or give strange results in custom built python environments.

🔴 Please generate a diagnostics report and upload the "diagnostics.log" as this helps me understand your configuration.

https://github.com/erew123/alltalk_tts/tree/main?#-how-to-make-a-diagnostics-report-file

Describe the bug Using text-generation-webui. As the context length creeps up, the TTS step fails more often. For this particular conversation, it is failing every time, despite regenerating text. Often I can regenerate, I get the same broken text but the TTS gen includes the <audosrc=... of the previous TTS (of the broken text), and then a second regenerate succeeds.

To Reproduce Steps to reproduce the behaviour: Not consistent, happens pretty randomly as I generate text.

Screenshots If applicable, add screenshots to help explain your problem.

Text/logs

[AllTalk TTSGen] Character (Text-not-inside)
[AllTalk TTSGen] She
[AllTalk TTSGen] 0.40 seconds. LowVRAM: False DeepSpeed: True
Output generated in 0.58 seconds (5.16 tokens/s, 3 tokens, context 7676, seed 650711098)
Traceback (most recent call last):
  File "D:\libraries\text-generation-webui\installer_files\env\Lib\site-packages\gradio\queueing.py", line 501, in call_prediction
    output = await route_utils.call_process_api(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\libraries\text-generation-webui\installer_files\env\Lib\site-packages\gradio\route_utils.py", line 258, in call_process_api
    output = await app.get_blocks().process_api(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\libraries\text-generation-webui\installer_files\env\Lib\site-packages\gradio\blocks.py", line 1684, in process_api
    result = await self.call_function(
             ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\libraries\text-generation-webui\installer_files\env\Lib\site-packages\gradio\blocks.py", line 1262, in call_function
    prediction = await utils.async_iteration(iterator)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\libraries\text-generation-webui\installer_files\env\Lib\site-packages\gradio\utils.py", line 574, in async_iteration
    return await iterator.__anext__()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\libraries\text-generation-webui\installer_files\env\Lib\site-packages\gradio\utils.py", line 567, in __anext__
    return await anyio.to_thread.run_sync(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\libraries\text-generation-webui\installer_files\env\Lib\site-packages\anyio\to_thread.py", line 56, in run_sync
    return await get_async_backend().run_sync_in_worker_thread(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\libraries\text-generation-webui\installer_files\env\Lib\site-packages\anyio\_backends\_asyncio.py", line 2144, in run_sync_in_worker_thread
    return await future
           ^^^^^^^^^^^^
  File "D:\libraries\text-generation-webui\installer_files\env\Lib\site-packages\anyio\_backends\_asyncio.py", line 851, in run
    result = context.run(func, *args)
             ^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\libraries\text-generation-webui\installer_files\env\Lib\site-packages\gradio\utils.py", line 550, in run_sync_iterator_async
    return next(iterator)
           ^^^^^^^^^^^^^^
  File "D:\libraries\text-generation-webui\installer_files\env\Lib\site-packages\gradio\utils.py", line 733, in gen_wrapper
    response = next(iterator)
               ^^^^^^^^^^^^^^
  File "D:\libraries\text-generation-webui\modules\chat.py", line 414, in generate_chat_reply_wrapper
    for i, history in enumerate(generate_chat_reply(text, state, regenerate, _continue, loading_message=True, for_ui=True)):
  File "D:\libraries\text-generation-webui\modules\chat.py", line 382, in generate_chat_reply
    for history in chatbot_wrapper(text, state, regenerate=regenerate, _continue=_continue, loading_message=loading_message, for_ui=for_ui):
  File "D:\libraries\text-generation-webui\modules\chat.py", line 350, in chatbot_wrapper
    output['visible'][-1][1] = apply_extensions('output', output['visible'][-1][1], state, is_chat=True)
                               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\libraries\text-generation-webui\modules\extensions.py", line 231, in apply_extensions
    return EXTENSION_MAP[typ](*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\libraries\text-generation-webui\modules\extensions.py", line 89, in _apply_string_extensions
    text = func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "D:\libraries\text-generation-webui\extensions\alltalk_tts\script.py", line 721, in output_modifier
    final_output_file = combine(
                        ^^^^^^^^
  File "D:\libraries\text-generation-webui\extensions\alltalk_tts\script.py", line 516, in combine
    sf.write(output_file_path, audio, samplerate=sample_rate)
                                                 ^^^^^^^^^^^
UnboundLocalError: cannot access local variable 'sample_rate' where it is not associated with a value

Desktop (please complete the following information): AllTalk was updated: 2024/04/17 Custom Python environment: no Text-generation-webUI was updated: 2024/04/17

Additional context diagnostics.log

erew123 commented 4 months ago

Hi @Tetragramm

Thanks for the diagnostics and details. Can I ask, when you are seeing the failures do you have sentences that are 2-3 characters in length with the generations? e.g.:

[AllTalk TTSGen] Character (Text-not-inside) [AllTalk TTSGen] She

These should be dropped from the generation and not added to the catalogue of wav files, however, its possible there is a hidden/non-visible character we cant see in there that's causing something odd to happen.

Text generations 3 or less characters should be automatically removed, not generated and so not added to the list of WAV files to be combined. The specific error you are experiencing is when it combines the wav files into one file:

UnboundLocalError: cannot access local variable 'sample_rate' where it is not associated with a value

The combine reads all the wav files generated, confirms they are valid files/have the same sample rate and then combines them. What seems to be happening here (best I can tell) is that one file either doesn't exist on disk OR didn't return a sample rate value. Im slightly baffed as to why, but maybe I can suggest something you can try and let me know if it resolves your issues.

On line 676 of the script.py is if len(part.strip()) <= 3:

image

Try changing the 3 to a 1 and see if that resolves the problem you are experiencing?

Without seeing the original text in its raw form, its hard for me to breakdown exactly why you have that short She as a generation and not a full sentence. If it works, I can always add this as an advanced variable that can be changed in the interface.

Let me know

Thanks

Tetragramm commented 4 months ago

I don't believe they are all that short. I wasn't paying much attention to the failed generation's text though. I will check the next time I see the failure.

erew123 commented 4 months ago

No problem. It would be any of the individual sentences that are in the generation, so if its combining lets say 6 sentences together, it doesn't just have to be the last sentence out of the 6.

That aside Im puzzles as to what else it could be. Though maybe a corrupt wav generated. Let me know.

Thanks

erew123 commented 4 months ago

Hi @Tetragramm

Not sure if you resolved this or not. Im going to add in various additional controls in the next version of AllTalk I upload, so this setting will be part of that.

If you need to get back to me, please do so.

Thanks