SWivid / F5-TTS

Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"
https://arxiv.org/abs/2410.06885
MIT License
4.67k stars 444 forks source link

Broken on MPS #190

Closed cocktailpeanut closed 2 hours ago

cocktailpeanut commented 3 hours ago

Not sure what happened, but looks like the app is broken on Macs at the moment.

Just did a fresh install and the app itself runs, but the resulting audio is empty.

Also I am not sure if the logs are useful but pasting just in case:

/Users/x/pinokio/api/e2-f5-tts.git/app/env/lib/python3.10/site-packages/transformers/models/whisper/generation_whisper.py:496: FutureWarning: The input name `inputs` is deprecated. Please make sure to use `input_features` instead.
  warnings.warn(
You have passed task=transcribe, but also have set `forced_decoder_ids` to [[1, None], [2, 50360]] which creates a conflict. `forced_decoder_ids` will be ignored in favor of task=transcribe.

Passing a tuple of `past_key_values` is deprecated and will be removed in Transformers v4.43.0. You should pass an instance of `EncoderDecoderCache` instead, e.g. `past_key_values=EncoderDecoderCache.from_legacy_cache(past_key_values)`.
The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
gen_text 0 Reference text will be automatically transcribed with Whisper if not provided. For best results, keep your reference clips short
Building prefix dict from the default dictionary ...
Loading model from cache /Users/x/pinokio/cache/TMPDIR/jieba.cache
Loading model cost 0.426 seconds.
Prefix dict has been built successfully.
/Users/x/pinokio/api/e2-f5-tts.git/app/env/lib/python3.10/site-packages/gradio/processing_utils.py:574: UserWarning: Trying to convert audio automatically from float32 to 16-bit int format.
  warnings.warn(warning.format(data.dtype))
/Users/x/pinokio/api/e2-f5-tts.git/app/env/lib/python3.10/site-packages/gradio/processing_utils.py:577: RuntimeWarning: invalid value encountered in cast
  data = data.astype(np.int16)

I am not completely sure but I don't remember seeing this many warning messages previously.

SWivid commented 3 hours ago

a default fp16 inference setting was added. see if the last commit works d3badb95cf1b97a61472d65d4787a35cddf9c908

cocktailpeanut commented 2 hours ago

@SWivid this worked, thank you!

Could you share what the switch to fp16 means from end user's point of view (performance, etc.)? Appreciate it!

SWivid commented 2 hours ago

Could you share what the switch to fp16 means from end user's point of view (performance, etc.)? Appreciate it!

A bit faster than using fp32, ~half graphics card usage (the %), and more environmentally friendly maybe lol Compared to a more aggressive int8 quantization, it can be seen as no performance (quality) penalty.

cocktailpeanut commented 2 hours ago

Thank you!