SWivid / F5-TTS

Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"
https://arxiv.org/abs/2410.06885
MIT License
7.36k stars 885 forks source link

Trying to generate voice #388

Closed Maenod closed 2 weeks ago

Maenod commented 2 weeks ago

Checks

Environment Details

Windows 10, Python 3.10

Steps to Reproduce

  1. Activate Conda env named f5 --> { conda activate f5 }
  2. Run command "f5-tts_infer-gradio"
  3. Opening the web, uploading voice, adding text
  4. Clicking Synthesize

✔️ Expected Behavior

Should Generate a voice based on the provided text and uploaded voice.

❌ Actual Behavior

Keeping running for very long time and after that it generate audio without sound. 1

Error Message :

Starting app... Running on local URL: http://127.0.0.1:7860

To create a public link, set share=True in launch(). C:\ProgramData\miniconda3\envs\f5\lib\site-packages\transformers\models\whisper\generation_whisper.py:509: FutureWarning: The input name inputs is deprecated. Please make sure to use input_features instead. warnings.warn( You have passed task=transcribe, but also have set forced_decoder_ids to [[1, None], [2, 50360]] which creates a conflict. forced_decoder_ids will be ignored in favor of task=transcribe. C:\ProgramData\miniconda3\envs\f5\lib\site-packages\transformers\models\whisper\modeling_whisper.py:545: UserWarning: 1Torch was not compiled with flash attention. (Triggered internally at ..\aten\src\ATen\native\transformers\cuda\sdp_utils.cpp:455.) attn_output = torch.nn.functional.scaled_dot_product_attention( Passing a tuple of past_key_values is deprecated and will be removed in Transformers v4.43.0. You should pass an instance of EncoderDecoderCache instead, e.g. past_key_values=EncoderDecoderCache.from_legacy_cache(past_key_values). The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's attention_mask to obtain reliable results. gen_text 0 My name is Maen Building prefix dict from the default dictionary ... Loading model from cache C:\Users\maena\AppData\Local\Temp\jieba.cache Loading model cost 0.631 seconds. Prefix dict has been built successfully.

SWivid commented 2 weeks ago

I have thoroughly reviewed the project documentation but couldn't find information to solve my problem.

@Maenod Several possible solutions in #356 First check network connection with huggingface, then try https://github.com/SWivid/F5-TTS/blob/ac77a76cd3bc04c5aea12bd67980336433640f6a/src/f5_tts/infer/utils_infer.py#L131 https://github.com/SWivid/F5-TTS/blob/ac77a76cd3bc04c5aea12bd67980336433640f6a/src/f5_tts/infer/utils_infer.py#L144 torch_dtype=torch.float32 model = model.to(torch.float32)

SWivid commented 2 weeks ago

will close this issue as it's a duplicate of #356

Maenod commented 2 weeks ago

will close this issue as it's a duplicate of #356

It give me Error: CUDA out of memory. Tried to allocate 20.00 MiB. GPU