SWivid / F5-TTS

Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"
https://arxiv.org/abs/2410.06885
MIT License
7.46k stars 919 forks source link

Added pyproject.apple_silicon.toml adapted to Apple chips #477

Closed aboutmydreams closed 1 week ago

aboutmydreams commented 1 week ago
SWivid commented 1 week ago

@aboutmydreams see if simply replace this in pyproject.toml will work

"bitsandbytes>0.37.0; platform_machine != 'arm64' and platform_system != 'Darwin'"
aboutmydreams commented 1 week ago

@aboutmydreams see if simply replace this in pyproject.toml will work

"bitsandbytes>0.37.0; platform_machine != 'arm64' and platform_system != 'Darwin'"

@SWivid Yes, this is also possible, but you also need to use this .apple_silicon.env and add "python-dotenv" to pyproject.toml. I have updated the code and documentation in this part.

SWivid commented 1 week ago

but you also need to use this .apple_silicon.env and add "python-dotenv" to pyproject.toml.

Maybe you could explain more the the other parts of modifications? I'm not that familiar with apple silicon dev env, and I'm not quite convinced of introducing much new files for what kind of usage.

e.g. what issue you've got that https://github.com/SWivid/F5-TTS/blob/0f80f25c5fc95aed21a560bec22fed9d237948bf/src/f5_tts/infer/utils_infer.py#L36-L37 will not cover and need a toml env and lines of extra installation steps?

aboutmydreams commented 1 week ago
Download Vocos from huggingface charactr/vocos-mel-24khz

vocab :  /Users/apple/coding/learn/F5-TTS/src/f5_tts/infer/examples/vocab.txt
token :  custom
model :  /Users/apple/.cache/huggingface/hub/models--SWivid--F5-TTS/snapshots/4dcc16f297f2ff98a17b3726b16f5de5a5e45672/F5TTS_Base/model_1200000.safetensors 

Starting app...
Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.
ref_text   Hi all, I come form china
gen_text 0 Hi all, good luck
Building prefix dict from the default dictionary ...
Loading model from cache /var/folders/6g/588r34dn1t381b5kfk3282r40000gn/T/jieba.cache
Loading model cost 0.379 seconds.
Prefix dict has been built successfully.
Traceback (most recent call last):
  File "/Users/apple/coding/learn/F5-TTS/.venv/lib/python3.12/site-packages/gradio/queueing.py", line 536, in process_events
    response = await route_utils.call_process_api(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/apple/coding/learn/F5-TTS/.venv/lib/python3.12/site-packages/gradio/route_utils.py", line 322, in call_process_api
    output = await app.get_blocks().process_api(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/apple/coding/learn/F5-TTS/.venv/lib/python3.12/site-packages/gradio/blocks.py", line 1935, in process_api
    result = await self.call_function(
             ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/apple/coding/learn/F5-TTS/.venv/lib/python3.12/site-packages/gradio/blocks.py", line 1520, in call_function
    prediction = await anyio.to_thread.run_sync(  # type: ignore
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/apple/coding/learn/F5-TTS/.venv/lib/python3.12/site-packages/anyio/to_thread.py", line 56, in run_sync
    return await get_async_backend().run_sync_in_worker_thread(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/apple/coding/learn/F5-TTS/.venv/lib/python3.12/site-packages/anyio/_backends/_asyncio.py", line 2441, in run_sync_in_worker_thread
    return await future
           ^^^^^^^^^^^^
  File "/Users/apple/coding/learn/F5-TTS/.venv/lib/python3.12/site-packages/anyio/_backends/_asyncio.py", line 943, in run
    result = context.run(func, *args)
             ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/apple/coding/learn/F5-TTS/.venv/lib/python3.12/site-packages/gradio/utils.py", line 826, in wrapper
    response = f(*args, **kwargs)
               ^^^^^^^^^^^^^^^^^^
  File "/Users/apple/coding/learn/F5-TTS/src/f5_tts/infer/infer_gradio.py", line 217, in basic_tts
    audio_out, spectrogram_path, ref_text_out = infer(
                                                ^^^^^^
  File "/Users/apple/coding/learn/F5-TTS/src/f5_tts/infer/infer_gradio.py", line 136, in infer
    final_wave, final_sample_rate, combined_spectrogram = infer_process(
                                                          ^^^^^^^^^^^^^^
  File "/Users/apple/coding/learn/F5-TTS/src/f5_tts/infer/utils_infer.py", line 366, in infer_process
    return infer_batch_process(
           ^^^^^^^^^^^^^^^^^^^^
  File "/Users/apple/coding/learn/F5-TTS/src/f5_tts/infer/utils_infer.py", line 451, in infer_batch_process
    generated_wave = vocoder.decode(generated_mel_spec)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/apple/coding/learn/F5-TTS/.venv/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/Users/apple/coding/learn/F5-TTS/.venv/lib/python3.12/site-packages/vocos/pretrained.py", line 113, in decode
    audio_output = self.head(x)
                   ^^^^^^^^^^^^
  File "/Users/apple/coding/learn/F5-TTS/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/apple/coding/learn/F5-TTS/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/apple/coding/learn/F5-TTS/.venv/lib/python3.12/site-packages/vocos/heads.py", line 68, in forward
    audio = self.istft(S)
            ^^^^^^^^^^^^^
  File "/Users/apple/coding/learn/F5-TTS/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/apple/coding/learn/F5-TTS/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/apple/coding/learn/F5-TTS/.venv/lib/python3.12/site-packages/vocos/spectral_ops.py", line 46, in forward
    return torch.istft(spec, self.n_fft, self.hop_length, self.win_length, self.window, center=True)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
NotImplementedError: The operator 'aten::unfold_backward' is not currently implemented for the MPS device. If you want this op to be added in priority during the prototype phase of this feature, please comment on https://github.com/pytorch/pytorch/issues/77764. As a temporary fix, you can set the environment variable `PYTORCH_ENABLE_MPS_FALLBACK=1` to use the CPU as a fallback for this op. WARNING: this will be slower than running natively on MPS.

Option 1

# from dotenv import load_dotenv
# import os
# load_dotenv()
# if os.getenv("PYTORCH_ENABLE_MPS_FALLBACK") == "1":
# print("You are using the version optimized for Apple silicon.")

Option 2

import os
import torch
device = "cuda" if torch.cuda.is_available() else "mps" if torch.backends.mps.is_available() else "cpu"
if device == "mps":
print("You are using the version optimized for Apple silicon.")
os.environ["PYTOCH_ENABLE_MPS_FALLBACK"] = "1"

I found that when using option 2, it can be printed out, but os.environ["PYTOCH_ENABLE_MPS_FALLBACK"] = "1" does not take effect.

I think when we use script f5-tts_infer-gradio In solution 2, although the prompt information can be printed successfully, os.environ["PYTORCH_ENABLE_MPS_FALLBACK"] does not really take effect. This is because changes to os.environ usually only affect the current Python process and its subprocesses, and cannot dynamically change the behavior of already loaded libraries (such as PyTorch). PyTorch is usually initialized based on environment variables when loading, so the timing of modifying environment variables is very important. If you modify environment variables after PyTorch has loaded, it may not have an impact on PyTorch's runtime configuration.

SWivid commented 1 week ago

@aboutmydreams understood, thanks~ then will it make sense to just put os.environ["PYTOCH_ENABLE_MPS_FALLBACK"] = "1" at first in utils_infer.py, since infer_cli and infer_gradio not contain import torch (which is intended as all func are organized in utils_infer.py)

i'm not sure if the pr version of env setup is lasting for speech_edit.py or api.py and stuff when launching a new cli terminal; if will lasting, thought it's a good way; or put it somewhere like .bashrc like linux is possible for mac device?

aboutmydreams commented 1 week ago

hi @SWivid Here’s my understanding:

  1. Placing os.environ["PYTORCH_ENABLE_MPS_FALLBACK"] = "1" in utils_infer.py
    This seems like a good approach since both infer_cli and infer_gradio rely on utils_infer.py for all PyTorch-related functionality. This ensures that the environment variable is set early enough before PyTorch is loaded.

  2. Regarding the persistency of environment variables
    Dynamically setting os.environ in the code only affects the current Python process and any child processes. It won’t persist across new CLI sessions or other independent scripts like speech_edit.py or api.py.

    To make the variable globally available on macOS, users can add the line below to their .zshrc (or .bashrc depending on their shell):

    export PYTORCH_ENABLE_MPS_FALLBACK=1

    After saving the file, they should run source ~/.zshrc to apply the changes. This way, any Python script launched in a new terminal will have access to this environment variable.

Let me know if this addresses your concerns!

SWivid commented 1 week ago

Hi @aboutmydreams , yes fully agreed, see if the commit cb8ce3306d70dfbee0e7d2423cc7f06e1c2b9c60 works. Thanks again for this PR, we are completely not against any helpful contribution, just trying to make it clear and user-blind to keep project simple to use~