SWivid / F5-TTS

Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"
https://arxiv.org/abs/2410.06885
MIT License
5.52k stars 563 forks source link

Buch of errors [Windows 10, Python 3.10]. final_wave = generated_waves[0], IndexError: list index out of range #205

Closed kamineko16 closed 1 hour ago

kamineko16 commented 4 days ago

Hi, I did everything fine and still get errors. I tried reinstalling from zero 3 times, cuda is 11.8 and I have ffmpeg the latest version. I updated pip, I updated Anaconda. My GPU driver is updated. I'm using Windows 10. I also tried to purge the cache without results. I installed Python 3.10 in the environment.

At first, I thought it was a file issue, so I tried a bunch of different allowed formats, but after the second re-installation, I just tried the sample audio you provided and still got exactly the same errors. I installed all using the requirements file. I also tried on my last attempt to move the git folder from admin driver C to regular one E, didn't help. I tried to run it with administration rights and didn't help.

This is all the errors: " C:\Users{username}\anaconda3\envs\f5\lib\site-packages\transformers\models\whisper\generation_whisper.py:496: FutureWarning: The input name inputs is deprecated. Please make sure to use input_features instead. warnings.warn( You have passed task=transcribe, but also have set forced_decoder_ids to [[1, None], [2, 50359]] which creates a conflict. forced_decoder_ids will be ignored in favor of task=transcribe. C:\Users{username}\anaconda3\envs\f5\lib\site-packages\transformers\models\whisper\modeling_whisper.py:599: UserWarning: 1Torch was not compiled with flash attention. (Triggered internally at ..\aten\src\ATen\native\transformers\cuda\sdp_utils.cpp:455.) attn_output = torch.nn.functional.scaled_dot_product_attention( Passing a tuple of past_key_values is deprecated and will be removed in Transformers v4.43.0. You should pass an instance of EncoderDecoderCache instead, e.g. past_key_values=EncoderDecoderCache.from_legacy_cache(past_key_values). The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's attention_mask to obtain reliable results. Traceback (most recent call last): File "C:\Users{username}\anaconda3\envs\f5\lib\site-packages\gradio\queueing.py", line 536, in process_events response = await route_utils.call_process_api( File "C:\Users{username}\anaconda3\envs\f5\lib\site-packages\gradio\route_utils.py", line 322, in call_process_api output = await app.get_blocks().process_api( File "C:\Users{username}\anaconda3\envs\f5\lib\site-packages\gradio\blocks.py", line 1935, in process_api result = await self.call_function( File "C:\Users{username}\anaconda3\envs\f5\lib\site-packages\gradio\blocks.py", line 1520, in call_function prediction = await anyio.to_thread.run_sync( # type: ignore File "C:\Users{username}\anaconda3\envs\f5\lib\site-packages\anyio\to_thread.py", line 56, in run_sync return await get_async_backend().run_sync_in_worker_thread( File "C:\Users{username}\anaconda3\envs\f5\lib\site-packages\anyio_backends_asyncio.py", line 2441, in run_sync_in_worker_thread return await future File "C:\Users{username}\anaconda3\envs\f5\lib\site-packages\anyio_backends_asyncio.py", line 943, in run result = context.run(func, args) File "C:\Users{username}\anaconda3\envs\f5\lib\site-packages\gradio\utils.py", line 826, in wrapper response = f(args, **kwargs) File "E:\F5-TTS\gradio_app.py", line 66, in infer final_wave, final_sample_rate, combined_spectrogram = infer_process( File "E:\F5-TTS\model\utils_infer.py", line 214, in infer_process return infer_batch_process( File "E:\F5-TTS\model\utils_infer.py", line 307, in infer_batch_process final_wave = generated_waves[0] IndexError: list index out of range "

btw "generated_waves" is empty if it's important. I know it because, on my very first attempt, I was trying to use GPT, and it suggested printing "generated_waves" to check if it's really empty, so I just added a print line before the error and yes it was empty.

SWivid commented 4 days ago

seems a failure with whisper asr pipeline. have you tried with input manual the reference text for prompt audio?

kamineko16 commented 4 days ago

seems a failure with whisper asr pipeline. have you tried with input manual the reference text for prompt audio?

Tried now, errors: " Traceback (most recent call last): File "C:\Users{username}\anaconda3\envs\f5\lib\site-packages\gradio\queueing.py", line 536, in process_events response = await route_utils.call_process_api( File "C:\Users{username}\anaconda3\envs\f5\lib\site-packages\gradio\route_utils.py", line 322, in call_process_api output = await app.get_blocks().process_api( File "C:\Users{username}\anaconda3\envs\f5\lib\site-packages\gradio\blocks.py", line 1935, in process_api result = await self.call_function( File "C:\Users{username}\anaconda3\envs\f5\lib\site-packages\gradio\blocks.py", line 1520, in call_function prediction = await anyio.to_thread.run_sync( # type: ignore File "C:\Users{username}\anaconda3\envs\f5\lib\site-packages\anyio\to_thread.py", line 56, in run_sync return await get_async_backend().run_sync_in_worker_thread( File "C:\Users{username}\anaconda3\envs\f5\lib\site-packages\anyio_backends_asyncio.py", line 2441, in run_sync_in_worker_thread return await future File "C:\Users{username}\anaconda3\envs\f5\lib\site-packages\anyio_backends_asyncio.py", line 943, in run result = context.run(func, args) File "C:\Users{username}\anaconda3\envs\f5\lib\site-packages\gradio\utils.py", line 826, in wrapper response = f(args, **kwargs) File "E:\F5-TTS\gradio_app.py", line 66, in infer final_wave, final_sample_rate, combined_spectrogram = infer_process( File "E:\F5-TTS\model\utils_infer.py", line 214, in infer_process return infer_batch_process( File "E:\F5-TTS\model\utils_infer.py", line 307, in infer_batch_process final_wave = generated_waves[0] IndexError: list index out of range "

I tried using whisper manually, and it works fine. Gave me the text of the "country.flac" with 100% accuracy.

SWivid commented 4 days ago

Hi @kamineko16 some possible solutions:

  1. try lower gradio version if >=5, pip install gradio==4.44.1
  2. check for audio format, should with pcm_s16le e.g.. will have image visuals like this if successfully uploaded
  3. maybe force re-pull the repo?
kamineko16 commented 4 days ago

Hi @kamineko16 some possible solutions:

  1. try lower gradio version if >=5, pip install gradio==4.44.1
  2. check for audio format, should with pcm_s16le e.g.. will have image visuals like this if successfully uploaded
  3. maybe force re-pull the repo?

Hi @SWivid , answers to your suggestions:

  1. It is already 4.44.1, so I guess it installed with the requirements as 4.44.1.

  2. I used the sample from the F5-TTS sample folder, and I guess it is in the right format? However, I tried what you said anyway. Here are pictures of both: Original Flac file: Regular

Same audio, but Wav file with pcm_s16le: pcm_s16le

Errors: " C:\Users{username}\anaconda3\envs\f5\lib\site-packages\transformers\models\whisper\generation_whisper.py:496: FutureWarning: The input name inputs is deprecated. Please make sure to use input_features instead. warnings.warn( You have passed task=transcribe, but also have set forced_decoder_ids to [[1, None], [2, 50359]] which creates a conflict. forced_decoder_ids will be ignored in favor of task=transcribe. C:\Users{username}\anaconda3\envs\f5\lib\site-packages\transformers\models\whisper\modeling_whisper.py:599: UserWarning: 1Torch was not compiled with flash attention. (Triggered internally at ..\aten\src\ATen\native\transformers\cuda\sdp_utils.cpp:455.) attn_output = torch.nn.functional.scaled_dot_product_attention( Passing a tuple of past_key_values is deprecated and will be removed in Transformers v4.43.0. You should pass an instance of EncoderDecoderCache instead, e.g. past_key_values=EncoderDecoderCache.from_legacy_cache(past_key_values). The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's attention_mask to obtain reliable results. Traceback (most recent call last): File "C:\Users{username}\anaconda3\envs\f5\lib\site-packages\gradio\queueing.py", line 536, in process_events response = await route_utils.call_process_api( File "C:\Users{username}\anaconda3\envs\f5\lib\site-packages\gradio\route_utils.py", line 322, in call_process_api output = await app.get_blocks().process_api( File "C:\Users{username}\anaconda3\envs\f5\lib\site-packages\gradio\blocks.py", line 1935, in process_api result = await self.call_function( File "C:\Users{username}\anaconda3\envs\f5\lib\site-packages\gradio\blocks.py", line 1520, in call_function prediction = await anyio.to_thread.run_sync( # type: ignore File "C:\Users{username}\anaconda3\envs\f5\lib\site-packages\anyio\to_thread.py", line 56, in run_sync return await get_async_backend().run_sync_in_worker_thread( File "C:\Users{username}\anaconda3\envs\f5\lib\site-packages\anyio_backends_asyncio.py", line 2441, in run_sync_in_worker_thread return await future File "C:\Users{username}\anaconda3\envs\f5\lib\site-packages\anyio_backends_asyncio.py", line 943, in run result = context.run(func, args) File "C:\Users{username}\anaconda3\envs\f5\lib\site-packages\gradio\utils.py", line 826, in wrapper response = f(args, **kwargs) File "E:\F5-TTS\gradio_app.py", line 66, in infer final_wave, final_sample_rate, combined_spectrogram = infer_process( File "E:\F5-TTS\model\utils_infer.py", line 214, in infer_process return infer_batch_process( File "E:\F5-TTS\model\utils_infer.py", line 307, in infer_batch_process final_wave = generated_waves[0] IndexError: list index out of range "

  1. I already tried it, when I said I reinstalled 3 times, what I meant was including the repo. I wiped out everything and downloaded everything again from zero 3 times. The last time was on drive E (current drive), in case it's an admin privilege issue, but it didn't help at all.
SWivid commented 4 days ago

@kamineko16 it's very weird we haven't met this before, seems you're being the first one encountering this, sadge will inference-cli.py work from your side?

kamineko16 commented 4 days ago

Hi @SWivid

@SWivid huh weird indeed, it's indeed working with inference-cli.py , so now I know everything is basically fine and the issue is with gradio_app.py ? Should I close this case or you want to check it more to understand why it's not working specifically with gradio_app.py ?

SWivid commented 4 days ago

Hi @kamineko16 , I have no idea of what's going wrong, as I could not reproduce the error. We could just leave it open for somedays, see if someone also encounter this and are able to fix.

superchargez commented 2 days ago

Hi @SWivid

@SWivid huh weird indeed, it's indeed working with inference-cli.py , so now I know everything is basically fine and the issue is with gradio_app.py ? Should I close this case or you want to check it more to understand why it's not working specifically with gradio_app.py ?

No. Problem is with whisper or transformer's pipeline. I have not tested on gradio, but my application is producing same warning. Though it generates output

kamineko16 commented 2 days ago

Hi @SWivid

@SWivid huh weird indeed, it's indeed working with inference-cli.py , so now I know everything is basically fine and the issue is with gradio_app.py ? Should I close this case or you want to check it more to understand why it's not working specifically with gradio_app.py ?

No. Problem is with whisper or transformer's pipeline. I have not tested on gradio, but my application is producing same warning. Though it generates output

I managed to make it work with gradio_app interface by asking GPT to merge gradio_app and inference-cli, however, because the code is too long and I didn't pay for GPT it managed to make only the basic function without podcast and emotions. However, for some reason, it still gets slightly better results with just inference-cli.

SWivid commented 1 hour ago

will close this issue, feel free to open if further questions~