SWivid / F5-TTS

Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"
https://arxiv.org/abs/2410.06885
MIT License
6.29k stars 709 forks source link

Export to ONNX Format #214

Open DakeQQ opened 1 week ago

DakeQQ commented 1 week ago
SWivid commented 1 week ago

@DakeQQ 非常感谢! 我们也有尝试过Int8,也发现慢了。希望有能人志士帮忙看看~

Thanks a lot! We also tried Int8 and found it slow. Hope someone help with it ~

GreenLandisaLie commented 1 week ago

Thank you!

Its working and I can even use onnxruntime-directml (package) to run this on my AMD GPU! For that - the provider of ort_session_A and ort_session_C needs to be forced to ['CPUExecutionProvider'] but ort_session_B can use ['DmlExecutionProvider', 'CPUExecutionProvider'] and its blazing fast vs CPU. Funny this works yet I cannot get torch_directml to work with the base .safetensors model (in gradio_app.py) no matter what I tried.

I'm facing a problem though - the ouputs are always in chinese... What do I need to change in 'Export_F5.py' to make this work for english?

DakeQQ commented 1 week ago

Thank you for your testing. However, the setup for the English version may need to be answered by the original author of the F5-TTS project. The code for ONNX export and execution is based on the original work.

According to my tests, ort_session_A and ort_session_C together take up less than 1% of the time cost, while ort_session_B occupies the majority of the time.

GreenLandisaLie commented 1 week ago

According to my tests, ort_session_A and ort_session_C together take up less than 1% of the time cost, while ort_session_B occupies the majority of the time.

Yes and is why inference speed is pretty much not affected by setting those to CPU. ort_session_B is what matters and it runs fine on AMD GPUs using onnxruntime-directml!

Anyways, I've tried messing around with vocab and ofc the reference audio and text but the speaker always tries to speak chinese - even when ref text+audio and gen_text are in english. May be worth noting this has nothing to do with the fact I'm using directml because it also happened before I even tried that.

Looking forward to get this working on English... @SWivid please check this out when you have time. Thanks once again!

DakeQQ commented 1 week ago

Hello~ The issue with the English voice should have been resolved. Please try again using the latest F5-TTS-ONNX version. @GreenLandisaLie

GreenLandisaLie commented 1 week ago

Its working now both in chinese and english! Thanks!

@SWivid Maybe its worth adding a 'ONNX' branch at https://huggingface.co/SWivid/F5-TTS/tree/main.

SWivid commented 1 week ago

@GreenLandisaLie Yes, the onnx version is great!

Maybe better for @DakeQQ to do that? and we will also add link to that onnx repo (currently credit and link to F5-TTS-ONNX repo).

eschmidbauer commented 1 week ago

can someone share the onnx export ? i would love to try it out! Thanks

KungFuFurniture commented 4 days ago

If anyone would be willing to run me through how to do this and get it working on my Win10 5700xt I would be eternally greatful. (well at least until the next TTS upgrade comes out.)

eschmidbauer commented 4 days ago

@KungFuFurniture see this repo i haven't tried in a few days but seems there has been some updates

KungFuFurniture commented 4 days ago

Yes I saw that, cloned the repo, changed some path directories in the export.py... But now I'm lost. I am really new to all this (maybe a year or so) so I am not 100% on what I am getting wrong.

Traceback (most recent call last):
  File "D:\Games\F5\F5-TTS1\src\f5_tts\export_f5.py", line 316, in <module>
    torch.onnx.export(
  File "D:\Games\F5\F5-TTS\env\lib\site-packages\torch\onnx\utils.py", line 551, in export
    _export(
  File "D:\Games\F5\F5-TTS\env\lib\site-packages\torch\onnx\utils.py", line 1648, in _export
    graph, params_dict, torch_out = _model_to_graph(
  File "D:\Games\F5\F5-TTS\env\lib\site-packages\torch\onnx\utils.py", line 1170, in _model_to_graph
    graph, params, torch_out, module = _create_jit_graph(model, args)
  File "D:\Games\F5\F5-TTS\env\lib\site-packages\torch\onnx\utils.py", line 1046, in _create_jit_graph
    graph, torch_out = _trace_and_get_graph_from_model(model, args)
  File "D:\Games\F5\F5-TTS\env\lib\site-packages\torch\onnx\utils.py", line 950, in _trace_and_get_graph_from_model
    trace_graph, torch_out, inputs_states = torch.jit._get_trace_graph(
  File "D:\Games\F5\F5-TTS\env\lib\site-packages\torch\jit\_trace.py", line 1497, in _get_trace_graph
    outs = ONNXTracedModule(
  File "D:\Games\F5\F5-TTS\env\lib\site-packages\torch\nn\modules\module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "D:\Games\F5\F5-TTS\env\lib\site-packages\torch\nn\modules\module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
  File "D:\Games\F5\F5-TTS\env\lib\site-packages\torch\jit\_trace.py", line 141, in forward
    graph, out = torch._C._create_graph_by_tracing(
  File "D:\Games\F5\F5-TTS\env\lib\site-packages\torch\jit\_trace.py", line 132, in wrapper
    outs.append(self.inner(*trace_inputs))
  File "D:\Games\F5\F5-TTS\env\lib\site-packages\torch\nn\modules\module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "D:\Games\F5\F5-TTS\env\lib\site-packages\torch\nn\modules\module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
  File "D:\Games\F5\F5-TTS\env\lib\site-packages\torch\nn\modules\module.py", line 1543, in _slow_forward
    result = self.forward(*input, **kwargs)
  File "D:\Games\F5\F5-TTS1\src\f5_tts\export_f5.py", line 154, in forward
    pred = self.f5_transformer(x=noise, cond=cat_mel_text, cond_drop=cat_mel_text_drop, time=self.time_expand[:, time_step], rope_cos=rope_cos, rope_sin=rope_sin, qk_rotated_empty=qk_rotated_empty)
  File "D:\Games\F5\F5-TTS\env\lib\site-packages\torch\nn\modules\module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "D:\Games\F5\F5-TTS\env\lib\site-packages\torch\nn\modules\module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
  File "D:\Games\F5\F5-TTS\env\lib\site-packages\torch\nn\modules\module.py", line 1543, in _slow_forward
    result = self.forward(*input, **kwargs)
TypeError: DiT.forward() got an unexpected keyword argument 'cond_drop'

This is my error message.

DakeQQ commented 4 days ago

The wrong message "DiT.forward() got an unexpected keyword argument 'cond_drop'", shows that the export process used the original code doing the export process.

First, we use shutil.copyfile (Export_F5.py at line 77-82) to replace the original code with the modified version. Ensure that the modified Python scripts are stored in the 'modeling_modified' folder.

shutil.copyfile(modified_path + 'vocos/heads.py', python_package_path + '/vocos/heads.py')
shutil.copyfile(modified_path + 'vocos/models.py', python_package_path + '/vocos/models.py')
shutil.copyfile(modified_path + 'vocos/modules.py', python_package_path + '/vocos/modules.py')
shutil.copyfile(modified_path + 'vocos/pretrained.py', python_package_path + '/vocos/pretrained.py')
shutil.copyfile(modified_path + 'F5/modules.py', F5_project_path + '/model/modules.py')
shutil.copyfile(modified_path + 'F5/dit.py', F5_project_path + '/model/backbones/dit.py') 

(We may have accidentally deleted some code. Please fetch the latest code and try again.)

KungFuFurniture commented 3 days ago

The wrong message "DiT.forward() got an unexpected keyword argument 'cond_drop'", shows that the export process used the original code doing the export process.

So I did a complete startover. Grabbed fresh F5, fresh venv, grabbed the link above, changed file locations from user Dake... It seems my file structure and some names are a bit different, and I believe that is getting me in some trouble. For example:

from src.f5_tts.model import CFM, DiT
from src.f5_tts.infer.utils_infer import load_checkpoint

load_checkpoints is in utils_infer not models.utils in my version of the f5 repo. But I believe I have found most of those things. Now I am stuck here:

Traceback (most recent call last):
  File "D:\Games\TTS\F5-TTS\export_f5.py", line 14, in <module>
    from src.f5_tts.infer.utils_infer import load_checkpoint
  File "D:\Games\TTS\F5-TTS\src\f5_tts\infer\utils_infer.py", line 32, in <module>
    vocos = Vocos.from_pretrained("charactr/vocos-mel-24khz")
  File "D:\Games\TTS\F5-TTS\env\lib\site-packages\vocos\pretrained.py", line 69, in from_pretrained
    model = cls.from_hparams(config_path)
  File "D:\Games\TTS\F5-TTS\env\lib\site-packages\vocos\pretrained.py", line 54, in from_hparams
    with open(config_path, "r") as f:
FileNotFoundError: [Errno 2] No such file or directory: 'charactr/vocos-mel-24khz/config.yaml'

I mean I have the config and pytorch_model but I can't figure out where to put em. I have tried about 16 different folders from a cached huggingface folder to the aforementioned infer folder. I dunno. I don't know anything about vocos and its lil brick road is far from Yellow. I fell outta Kansas quick.

SWivid commented 3 days ago

replace vocos = Vocos.from_pretrained("charactr/vocos-mel-24khz") with

vocos = Vocos.from_hparams(f"{local_path}/config.yaml")
state_dict = torch.load(f"{local_path}/pytorch_model.bin", map_location=device)
vocos.load_state_dict(state_dict)
vocos.eval()
KungFuFurniture commented 3 days ago

Alright, making progress. Thank you for the help. After defining the local_path, I got the DiT uncond error again. Compared the 2 dit.py files they are the same. So it did copy. I ran it again... Got a different error.

Traceback (most recent call last):
  File "D:\Games\TTS\F5-TTS\export_f5.py", line 13, in <module>
    from src.f5_tts.model import CFM, DiT
  File "D:\Games\TTS\F5-TTS\src\f5_tts\model\__init__.py", line 1, in <module>
    from f5_tts.model.cfm import CFM
  File "D:\Games\TTS\F5-TTS\src\f5_tts\model\__init__.py", line 4, in <module>
    from f5_tts.model.backbones.dit import DiT
  File "D:\Games\TTS\F5-TTS\src\f5_tts\model\backbones\dit.py", line 16, in <module>
    from model.modules import (
ModuleNotFoundError: No module named 'model'

Which you can see in the path model is there and module is within and so are the functions we are after. So I added the following line to the dit.py, as I used that once in a different project to resolve a similar "can't find the module" issue.

sys.path.append(os.path.abspath(os.path.join(os.path.dirname(__file__), '../..')))

That did not help...

Traceback (most recent call last):
  File "D:\Games\TTS\F5-TTS\export_f5.py", line 13, in <module>
    from src.f5_tts.model import CFM, DiT
  File "D:\Games\TTS\F5-TTS\src\f5_tts\model\__init__.py", line 1, in <module>
    from f5_tts.model.cfm import CFM
  File "D:\Games\TTS\F5-TTS\src\f5_tts\model\__init__.py", line 4, in <module>
    from f5_tts.model.backbones.dit import DiT
  File "D:\Games\TTS\F5-TTS\src\f5_tts\model\backbones\dit.py", line 16, in <module>
    from model.modules import (
  File "D:\Games\TTS\F5-TTS\src\f5_tts\model\__init__.py", line 4, in <module>
    from f5_tts.model.backbones.dit import DiT
ImportError: cannot import name 'DiT' from partially initialized module 'f5_tts.model.backbones.dit' (most likely due to a circular import) (D:\Games\TTS\F5-TTS\src\f5_tts\model\backbones\dit.py)

But hey new errors are progress right?

SWivid commented 3 days ago

error due to literally a circular import not with sys.path.append(os.path.abspath(os.path.join(os.path.dirname(__file__), '../..'))) but from model.modules import ( to from f5_tts.model.modules import ( we have reorganize the repo making it compatible for pkg form, check the lastest version

KungFuFurniture commented 3 days ago

Git pulled, got an update... Same thing

(env) D:\Games\TTS\F5-TTS>python export_f5.py
Traceback (most recent call last):
  File "D:\Games\TTS\F5-TTS\export_f5.py", line 13, in <module>
    from src.f5_tts.model import CFM, DiT
  File "D:\Games\TTS\F5-TTS\src\f5_tts\model\__init__.py", line 1, in <module>
    from f5_tts.model.cfm import CFM
  File "D:\Games\TTS\F5-TTS\src\f5_tts\model\__init__.py", line 4, in <module>
    from f5_tts.model.backbones.dit import DiT
  File "D:\Games\TTS\F5-TTS\src\f5_tts\model\backbones\dit.py", line 15, in <module>
    from model.modules import (
ModuleNotFoundError: No module named 'model'
SWivid commented 3 days ago

The point is: when replacing the modified script for ONNX compatibility, e.g. Export_ONNX/F5_TTS/modeling_modified/F5/dit.py, need to keep an eye on the differences like https://github.com/DakeQQ/F5-TTS-ONNX/blob/259d6198b6e91d6911bbd1f1e3a5ca96c0d21711/Export_ONNX/F5_TTS/modeling_modified/F5/dit.py#L16 Just put together the two repos and take a while look into it, you'll get it

GreenLandisaLie commented 3 days ago

@KungFuFurniture You just need to replace the existing F5 repo files with the equivalent ones from the ONNX export repo and do the same for the VOCO package installation files as well (...\Lib\site-packages\vocos). Place Export_F5.py directly on the root F5 folder (where gradio_app.py is), activate F5 environment then run it. Once converted, replace the files you replaced with their original counterparts (do a new install if you must). I think you have most of this figured out by now.

Just want to add one important thing if you want to run this on a AMD GPU you might need to do this:

'pip uninstall onnxruntime' then 'pip install onnxruntime-directml' and change the inference code by setting ort_session_B's providers to ['DmlExecutionProvider', 'CPUExecutionProvider']. The inference code for ONNX is essentially the last part of the Export_F5.py script and if you want to run it with gradio just make a copy of the gradio_app.py file, add 'import onnxruntime' and 'import jieba' followed by all of the necessary changes that are a bit too many for me to list them all. But in essence, you just need to replace the original pytorch inference code with the onnx equivalent, remove spectrogram inputs and ouputs from gradio as well as its functions during inference and force load your onnx models while ignoring the other pytorch ones... that's pretty much it.

PS: this is how I did it a week ago but the Export_T5.py file has been changed many times since then and so this might no longer work. Additionally - at the time - the Export_T5.py file did not contain necessary audio transformations that allow for invalid format .wav reference audio files so I had to copy paste those from the original code. You might or might not need to do this as well. Good Luck :D Hopefully someone will release the converted .onnx models with a pipeline for it so it will be easy to use in the future.

DakeQQ commented 3 days ago

@KungFuFurniture We are very sorry for your poor experience. Due to the rapid updates of the original work, we were unable to update in time. Now, we have adapted and tested the export for the latest SWivid/F5-TTS. Please download the F5-TTS-ONNX export code again and try it once more.

Please note that, we use the modified load vocos method by the following code at line 52: shutil.copyfile(modified_path + '/vocos/pretrained.py', python_package_path + '/vocos/pretrained.py') If you can access the HuggingFace repository 'charactr/vocos-mel-24khz' directly, you can disable that line of code and re-install the vocos python package(it may be modified). Next, set the vocos_model_path = 'charactr/vocos-mel-24khz'

KungFuFurniture commented 2 days ago

First let me say to everyone, Thank you for the help. @DakeQQ Certainly not a poor experience, but a learning experience. I certainly appreciate the work you have done here. An effort is Awesome. So I made the Execution provider change to "B" as suggested. I got the Export.py to run successfully. I swapped back all the files it changed, both vocos, and f5 (modules, pretrained, etc.)
@GreenLandisaLie I have onnxruntime-directml (torch too). Gradio_app.py is no longer a thing, but there is an alternative. I am not sure that's where the change needs to be made any longer.

So here is where I am. Export seems to have worked, and I can still run the app, and it works. But it works exactly the same. Not using the GPU. (AMD 5700xt) That is I am sure a result of what Green mentioned in adjustments to app.py.

I feel like such a Kindergardner in College. I am so far in over my head gang. I learned Python from Youtube. lol. I know nothing about onnx - torch except that they help make the magic work.

So any suggestions on what to do next... ? Again all help is super appreciated. And I get it if you don't have time to educate me.

Cheers to all.

DakeQQ commented 2 days ago

@KungFuFurniture If you're a beginner, it's advisable to start with simpler models like YOLO-v9, which are well-suited for NPUs and GPUs due to their gpu-friendly architecture.

  1. Begin by successfully invoking the GPU with a simple model, as image processing models are generally easier to handle.
  2. Export the model with static input and output shapes by setting dynamic_axes=None. This increases the likelihood of successful GPU code building.
  3. Quantize the exported ONNX model to Float16 format. It better for GPU compute.
  4. Use optimization tools such as onnxsim (pip install onnxsim) to simplify the exported model.
  5. You can visualize the model structure using the Netron tool. If all operator node input/output shapes are numeric, it indicates a high probability of successful GPU execution.

Additionally, set the ONNX Runtime log level to 0 or 1 with session_opts.log_severity_level = 0. This provides detailed error reports from ONNX, which can be used to seek help from ChatGPT. Following these error reports should help you resolve most issues.