Open DakeQQ opened 1 month ago
@DakeQQ 非常感谢! 我们也有尝试过Int8,也发现慢了。希望有能人志士帮忙看看~
Thanks a lot! We also tried Int8 and found it slow. Hope someone help with it ~
Thank you!
Its working and I can even use onnxruntime-directml (package) to run this on my AMD GPU! For that - the provider of ort_session_A and ort_session_C needs to be forced to ['CPUExecutionProvider'] but ort_session_B can use ['DmlExecutionProvider', 'CPUExecutionProvider'] and its blazing fast vs CPU. Funny this works yet I cannot get torch_directml to work with the base .safetensors model (in gradio_app.py) no matter what I tried.
I'm facing a problem though - the ouputs are always in chinese... What do I need to change in 'Export_F5.py' to make this work for english?
Thank you for your testing. However, the setup for the English version may need to be answered by the original author of the F5-TTS project. The code for ONNX export and execution is based on the original work.
According to my tests, ort_session_A and ort_session_C together take up less than 1% of the time cost, while ort_session_B occupies the majority of the time.
According to my tests, ort_session_A and ort_session_C together take up less than 1% of the time cost, while ort_session_B occupies the majority of the time.
Yes and is why inference speed is pretty much not affected by setting those to CPU. ort_session_B is what matters and it runs fine on AMD GPUs using onnxruntime-directml!
Anyways, I've tried messing around with vocab and ofc the reference audio and text but the speaker always tries to speak chinese - even when ref text+audio and gen_text are in english. May be worth noting this has nothing to do with the fact I'm using directml because it also happened before I even tried that.
Looking forward to get this working on English... @SWivid please check this out when you have time. Thanks once again!
Hello~ The issue with the English voice should have been resolved. Please try again using the latest F5-TTS-ONNX version. @GreenLandisaLie
Its working now both in chinese and english! Thanks!
@SWivid Maybe its worth adding a 'ONNX' branch at https://huggingface.co/SWivid/F5-TTS/tree/main.
@GreenLandisaLie Yes, the onnx version is great!
Maybe better for @DakeQQ to do that? and we will also add link to that onnx repo (currently credit and link to F5-TTS-ONNX repo).
can someone share the onnx export ? i would love to try it out! Thanks
If anyone would be willing to run me through how to do this and get it working on my Win10 5700xt I would be eternally greatful. (well at least until the next TTS upgrade comes out.)
@KungFuFurniture see this repo i haven't tried in a few days but seems there has been some updates
Yes I saw that, cloned the repo, changed some path directories in the export.py... But now I'm lost. I am really new to all this (maybe a year or so) so I am not 100% on what I am getting wrong.
Traceback (most recent call last):
File "D:\Games\F5\F5-TTS1\src\f5_tts\export_f5.py", line 316, in <module>
torch.onnx.export(
File "D:\Games\F5\F5-TTS\env\lib\site-packages\torch\onnx\utils.py", line 551, in export
_export(
File "D:\Games\F5\F5-TTS\env\lib\site-packages\torch\onnx\utils.py", line 1648, in _export
graph, params_dict, torch_out = _model_to_graph(
File "D:\Games\F5\F5-TTS\env\lib\site-packages\torch\onnx\utils.py", line 1170, in _model_to_graph
graph, params, torch_out, module = _create_jit_graph(model, args)
File "D:\Games\F5\F5-TTS\env\lib\site-packages\torch\onnx\utils.py", line 1046, in _create_jit_graph
graph, torch_out = _trace_and_get_graph_from_model(model, args)
File "D:\Games\F5\F5-TTS\env\lib\site-packages\torch\onnx\utils.py", line 950, in _trace_and_get_graph_from_model
trace_graph, torch_out, inputs_states = torch.jit._get_trace_graph(
File "D:\Games\F5\F5-TTS\env\lib\site-packages\torch\jit\_trace.py", line 1497, in _get_trace_graph
outs = ONNXTracedModule(
File "D:\Games\F5\F5-TTS\env\lib\site-packages\torch\nn\modules\module.py", line 1553, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "D:\Games\F5\F5-TTS\env\lib\site-packages\torch\nn\modules\module.py", line 1562, in _call_impl
return forward_call(*args, **kwargs)
File "D:\Games\F5\F5-TTS\env\lib\site-packages\torch\jit\_trace.py", line 141, in forward
graph, out = torch._C._create_graph_by_tracing(
File "D:\Games\F5\F5-TTS\env\lib\site-packages\torch\jit\_trace.py", line 132, in wrapper
outs.append(self.inner(*trace_inputs))
File "D:\Games\F5\F5-TTS\env\lib\site-packages\torch\nn\modules\module.py", line 1553, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "D:\Games\F5\F5-TTS\env\lib\site-packages\torch\nn\modules\module.py", line 1562, in _call_impl
return forward_call(*args, **kwargs)
File "D:\Games\F5\F5-TTS\env\lib\site-packages\torch\nn\modules\module.py", line 1543, in _slow_forward
result = self.forward(*input, **kwargs)
File "D:\Games\F5\F5-TTS1\src\f5_tts\export_f5.py", line 154, in forward
pred = self.f5_transformer(x=noise, cond=cat_mel_text, cond_drop=cat_mel_text_drop, time=self.time_expand[:, time_step], rope_cos=rope_cos, rope_sin=rope_sin, qk_rotated_empty=qk_rotated_empty)
File "D:\Games\F5\F5-TTS\env\lib\site-packages\torch\nn\modules\module.py", line 1553, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "D:\Games\F5\F5-TTS\env\lib\site-packages\torch\nn\modules\module.py", line 1562, in _call_impl
return forward_call(*args, **kwargs)
File "D:\Games\F5\F5-TTS\env\lib\site-packages\torch\nn\modules\module.py", line 1543, in _slow_forward
result = self.forward(*input, **kwargs)
TypeError: DiT.forward() got an unexpected keyword argument 'cond_drop'
This is my error message.
The wrong message "DiT.forward() got an unexpected keyword argument 'cond_drop'"
, shows that the export process used the original code doing the export process.
First, we use shutil.copyfile (Export_F5.py at line 77-82)
to replace the original code with the modified version. Ensure that the modified Python scripts are stored in the 'modeling_modified' folder.
shutil.copyfile(modified_path + 'vocos/heads.py', python_package_path + '/vocos/heads.py')
shutil.copyfile(modified_path + 'vocos/models.py', python_package_path + '/vocos/models.py')
shutil.copyfile(modified_path + 'vocos/modules.py', python_package_path + '/vocos/modules.py')
shutil.copyfile(modified_path + 'vocos/pretrained.py', python_package_path + '/vocos/pretrained.py')
shutil.copyfile(modified_path + 'F5/modules.py', F5_project_path + '/model/modules.py')
shutil.copyfile(modified_path + 'F5/dit.py', F5_project_path + '/model/backbones/dit.py')
(We may have accidentally deleted some code. Please fetch the latest code and try again.)
The wrong message
"DiT.forward() got an unexpected keyword argument 'cond_drop'"
, shows that the export process used the original code doing the export process.
So I did a complete startover. Grabbed fresh F5, fresh venv, grabbed the link above, changed file locations from user Dake... It seems my file structure and some names are a bit different, and I believe that is getting me in some trouble. For example:
from src.f5_tts.model import CFM, DiT
from src.f5_tts.infer.utils_infer import load_checkpoint
load_checkpoints is in utils_infer not models.utils in my version of the f5 repo. But I believe I have found most of those things. Now I am stuck here:
Traceback (most recent call last):
File "D:\Games\TTS\F5-TTS\export_f5.py", line 14, in <module>
from src.f5_tts.infer.utils_infer import load_checkpoint
File "D:\Games\TTS\F5-TTS\src\f5_tts\infer\utils_infer.py", line 32, in <module>
vocos = Vocos.from_pretrained("charactr/vocos-mel-24khz")
File "D:\Games\TTS\F5-TTS\env\lib\site-packages\vocos\pretrained.py", line 69, in from_pretrained
model = cls.from_hparams(config_path)
File "D:\Games\TTS\F5-TTS\env\lib\site-packages\vocos\pretrained.py", line 54, in from_hparams
with open(config_path, "r") as f:
FileNotFoundError: [Errno 2] No such file or directory: 'charactr/vocos-mel-24khz/config.yaml'
I mean I have the config and pytorch_model but I can't figure out where to put em. I have tried about 16 different folders from a cached huggingface folder to the aforementioned infer folder. I dunno. I don't know anything about vocos and its lil brick road is far from Yellow. I fell outta Kansas quick.
replace
vocos = Vocos.from_pretrained("charactr/vocos-mel-24khz")
with
vocos = Vocos.from_hparams(f"{local_path}/config.yaml")
state_dict = torch.load(f"{local_path}/pytorch_model.bin", map_location=device)
vocos.load_state_dict(state_dict)
vocos.eval()
Alright, making progress. Thank you for the help. After defining the local_path, I got the DiT uncond error again. Compared the 2 dit.py files they are the same. So it did copy. I ran it again... Got a different error.
Traceback (most recent call last):
File "D:\Games\TTS\F5-TTS\export_f5.py", line 13, in <module>
from src.f5_tts.model import CFM, DiT
File "D:\Games\TTS\F5-TTS\src\f5_tts\model\__init__.py", line 1, in <module>
from f5_tts.model.cfm import CFM
File "D:\Games\TTS\F5-TTS\src\f5_tts\model\__init__.py", line 4, in <module>
from f5_tts.model.backbones.dit import DiT
File "D:\Games\TTS\F5-TTS\src\f5_tts\model\backbones\dit.py", line 16, in <module>
from model.modules import (
ModuleNotFoundError: No module named 'model'
Which you can see in the path model is there and module is within and so are the functions we are after. So I added the following line to the dit.py, as I used that once in a different project to resolve a similar "can't find the module" issue.
sys.path.append(os.path.abspath(os.path.join(os.path.dirname(__file__), '../..')))
That did not help...
Traceback (most recent call last):
File "D:\Games\TTS\F5-TTS\export_f5.py", line 13, in <module>
from src.f5_tts.model import CFM, DiT
File "D:\Games\TTS\F5-TTS\src\f5_tts\model\__init__.py", line 1, in <module>
from f5_tts.model.cfm import CFM
File "D:\Games\TTS\F5-TTS\src\f5_tts\model\__init__.py", line 4, in <module>
from f5_tts.model.backbones.dit import DiT
File "D:\Games\TTS\F5-TTS\src\f5_tts\model\backbones\dit.py", line 16, in <module>
from model.modules import (
File "D:\Games\TTS\F5-TTS\src\f5_tts\model\__init__.py", line 4, in <module>
from f5_tts.model.backbones.dit import DiT
ImportError: cannot import name 'DiT' from partially initialized module 'f5_tts.model.backbones.dit' (most likely due to a circular import) (D:\Games\TTS\F5-TTS\src\f5_tts\model\backbones\dit.py)
But hey new errors are progress right?
error due to literally a circular import
not with sys.path.append(os.path.abspath(os.path.join(os.path.dirname(__file__), '../..')))
but from model.modules import (
to from f5_tts.model.modules import (
we have reorganize the repo making it compatible for pkg form, check the lastest version
Git pulled, got an update... Same thing
(env) D:\Games\TTS\F5-TTS>python export_f5.py
Traceback (most recent call last):
File "D:\Games\TTS\F5-TTS\export_f5.py", line 13, in <module>
from src.f5_tts.model import CFM, DiT
File "D:\Games\TTS\F5-TTS\src\f5_tts\model\__init__.py", line 1, in <module>
from f5_tts.model.cfm import CFM
File "D:\Games\TTS\F5-TTS\src\f5_tts\model\__init__.py", line 4, in <module>
from f5_tts.model.backbones.dit import DiT
File "D:\Games\TTS\F5-TTS\src\f5_tts\model\backbones\dit.py", line 15, in <module>
from model.modules import (
ModuleNotFoundError: No module named 'model'
The point is:
when replacing the modified script for ONNX compatibility, e.g. Export_ONNX/F5_TTS/modeling_modified/F5/dit.py
,
need to keep an eye on the differences like
https://github.com/DakeQQ/F5-TTS-ONNX/blob/259d6198b6e91d6911bbd1f1e3a5ca96c0d21711/Export_ONNX/F5_TTS/modeling_modified/F5/dit.py#L16
Just put together the two repos and take a while look into it, you'll get it
@KungFuFurniture You just need to replace the existing F5 repo files with the equivalent ones from the ONNX export repo and do the same for the VOCO package installation files as well (...\Lib\site-packages\vocos). Place Export_F5.py directly on the root F5 folder (where gradio_app.py is), activate F5 environment then run it. Once converted, replace the files you replaced with their original counterparts (do a new install if you must). I think you have most of this figured out by now.
Just want to add one important thing if you want to run this on a AMD GPU you might need to do this:
'pip uninstall onnxruntime
' then 'pip install onnxruntime-directml
' and change the inference code by setting ort_session_B's providers to ['DmlExecutionProvider', 'CPUExecutionProvider']
. The inference code for ONNX is essentially the last part of the Export_F5.py script and if you want to run it with gradio just make a copy of the gradio_app.py file, add 'import onnxruntime' and 'import jieba' followed by all of the necessary changes that are a bit too many for me to list them all. But in essence, you just need to replace the original pytorch inference code with the onnx equivalent, remove spectrogram inputs and ouputs from gradio as well as its functions during inference and force load your onnx models while ignoring the other pytorch ones... that's pretty much it.
PS: this is how I did it a week ago but the Export_T5.py file has been changed many times since then and so this might no longer work. Additionally - at the time - the Export_T5.py file did not contain necessary audio transformations that allow for invalid format .wav reference audio files so I had to copy paste those from the original code. You might or might not need to do this as well. Good Luck :D Hopefully someone will release the converted .onnx models with a pipeline for it so it will be easy to use in the future.
@KungFuFurniture We are very sorry for your poor experience. Due to the rapid updates of the original work, we were unable to update in time. Now, we have adapted and tested the export for the latest SWivid/F5-TTS. Please download the F5-TTS-ONNX export code again and try it once more.
Please note that, we use the modified load vocos method by the following code at line 52:
shutil.copyfile(modified_path + '/vocos/pretrained.py', python_package_path + '/vocos/pretrained.py')
If you can access the HuggingFace repository 'charactr/vocos-mel-24khz' directly, you can disable that line of code and re-install the vocos python package(it may be modified). Next, set the vocos_model_path = 'charactr/vocos-mel-24khz'
First let me say to everyone, Thank you for the help.
@DakeQQ Certainly not a poor experience, but a learning experience. I certainly appreciate the work you have done here. An effort is Awesome.
So I made the Execution provider change to "B" as suggested. I got the Export.py to run successfully. I swapped back all the files it changed, both vocos, and f5 (modules, pretrained, etc.)
@GreenLandisaLie I have onnxruntime-directml (torch too). Gradio_app.py is no longer a thing, but there is an alternative. I am not sure that's where the change needs to be made any longer.
So here is where I am. Export seems to have worked, and I can still run the app, and it works. But it works exactly the same. Not using the GPU. (AMD 5700xt) That is I am sure a result of what Green mentioned in adjustments to app.py.
I feel like such a Kindergardner in College. I am so far in over my head gang. I learned Python from Youtube. lol. I know nothing about onnx - torch except that they help make the magic work.
So any suggestions on what to do next... ? Again all help is super appreciated. And I get it if you don't have time to educate me.
Cheers to all.
@KungFuFurniture If you're a beginner, it's advisable to start with simpler models like YOLO-v9, which are well-suited for NPUs and GPUs due to their gpu-friendly architecture.
dynamic_axes=None
. This increases the likelihood of successful GPU code building.onnxsim
(pip install onnxsim
) to simplify the exported model.Netron
tool. If all operator node input/output shapes are numeric, it indicates a high probability of successful GPU execution.Additionally, set the ONNX Runtime log level to 0 or 1 with session_opts.log_severity_level = 0
. This provides detailed error reports from ONNX, which can be used to seek help from ChatGPT. Following these error reports should help you resolve most issues.
It looks like the repo has changed a lot since the last ONNX export attempt. I'm getting this error when trying to export to ONNX after replacing the modified vocos, and f5 files.
RuntimeError: Error(s) in loading state_dict for CFM:
Missing key(s) in state_dict: "mel_spec.mel_stft.spectrogram.window", "mel_spec.mel_stft.mel_scale.fb".
Any ideas?
@amblamps thought fixed by @DakeQQ , many thanks! Mainly for the change with 712d52772ef496b6cd191ba6197bac6e112fddd8 to 315230210d6698a6ce01da669c0fe4085accb693 at https://github.com/SWivid/F5-TTS/blob/4a69e6bad29dcb499e5cdec4104325f733eb485c/src/f5_tts/model/modules.py#L30-L143
@amblamps You can disable the src/f5_tts/infer/utils_infer.py, line 164-166, directely. Or use the lastest exported code and try more once.
# for key in ["mel_spec.mel_stft.mel_scale.fb", "mel_spec.mel_stft.spectrogram.window"]:
# if key in checkpoint["model_state_dict"]:
# del checkpoint["model_state_dict"][key]
@amblamps You can disable the src/f5_tts/infer/utils_infer.py, line 164-166, directely. Or use the lastest exported code and try more once.
# for key in ["mel_spec.mel_stft.mel_scale.fb", "mel_spec.mel_stft.spectrogram.window"]: # if key in checkpoint["model_state_dict"]: # del checkpoint["model_state_dict"][key]
Thanks! That worked.
has anyone shared a recent onnx export and code for inference?
@DakeQQ Do any other modifications need to be made to the script to export the E2 TTS model aside from pointing it to the correct checkpoint?
We have not yet attempted to export the E2-TTS model. If its function call path is the same as that of F5-TTS, theoretically, only modifying the model file path would be necessary to make the corresponding adjustments. However, the actual situation may be more complex, so we currently do not have specific plans to export E2-TTS in ONNX format.
There still seem to be issues with the mel params, has anyone been able to export recently ?
@smickovskid What mel parameter issues are you encountering? Could the STFT_Process.py script resolve them?
Getting the same issue that @amblamps encountered
Traceback (most recent call last):
File "F5-TTS-ONNX/Export_ONNX/F5_TTS/Export_F5.py", line 273, in <module>
f5_model = load_model(F5_safetensors_path)
File "F5-TTS-ONNX/Export_ONNX/F5_TTS/Export_F5.py", line 202, in load_model
return load_checkpoint(model, ckpt_path, 'cpu', use_ema=True)
File "F5-TTS/src/f5_tts/infer/utils_infer.py", line 168, in load_checkpoint
model.load_state_dict(checkpoint["model_state_dict"])
File "/miniconda3/envs/f5-tts/lib/python3.10/site-packages/torch/nn/modules/module.py", line 2189, in load_state_dict
raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for CFM:
Missing key(s) in state_dict: "mel_spec.mel_stft.spectrogram.window", "mel_spec.mel_stft.mel_scale.fb".
I am using a custom fine tuned model, I also ran STFT_Process.py
but still getting the same.
This is my Export_F5.py
config
F5_project_path = "/home/smickovskid/ai/F5-TTS" # The F5-TTS Github project download path. URL: https://github.com/SWivid/F5-TTS
F5_safetensors_path = "/home/smickovskid/ai/F5-TTS/ckpts/ClapTrap/model_last.pt" # The F5-TTS model download path. URL: https://huggingface.co/SWivid/F5-TTS/tree/main/F5TTS_Base
vocos_model_path = "/home/smickovskid/ai/F5-TTS-ONNX/vocos" # The Vocos model download path. URL: https://huggingface.co/charactr/vocos-mel-24khz/tree/main
onnx_model_A = "/home/smickovskid/ai/F5-TTS-ONNX/F5_Preprocess.onnx" # The exported onnx model path.
onnx_model_B = "/home/smickovskid/ai/F5-TTS-ONNX/F5_Transformer.onnx" # The exported onnx model path.
onnx_model_C = "/home/smickovskid/ai/F5-TTS-ONNX/F5_Decode.onnx" # The exported onnx model path.
python_package_path = '/home/smickovskid/miniconda3/envs/f5-tts/lib/python3.10/site-packages' # The Python package path.
modified_path = '/home/smickovskid/ai/F5-TTS-ONNX/Export_ONNX/F5_TTS/modeling_modified'
reference_audio = "/home/smickovskid/ai/F5-TTS/ckpts/ClapTrap/samples/step_20000_ref.wav" # The reference audio path.
generated_audio = "/home/smickovskid/ai/F5-TTS/ckpts/ClapTrap/samples/step_20000_gen.wav" # The generated audio path.
ref_text = "Sanctuary. This Glacier's full of nothing but murderers or jerkbags, like that Hammerlock dude. Minion! I've got my eyesight back, and you're far uglier than I remembered. Anyway, it's time to get to the Resistance in Sanctuary!"
gen_text = "Sanctuary. This Glacier's full of nothing but murderers or jerkbags, like that Hammerlock dude. Minion! I've got my eyesight back, and you're far uglier than I remembered. Anyway, it's time to get to the Resistance in Sanctuary!"
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
I am running python Export_ONNX/F5_TTS/Export_F5.py
as the command
Edit:
I've changed
model.load_state_dict(checkpoint["model_state_dict"], strict=False)
and it passes now but it fails down the line with
2024-11-17 03:26:34.026251302 [E:onnxruntime:, sequential_executor.cc:516 ExecuteKernel] Non-zero status code returned while running Mul node. Name:'/f5_transformer/transformer_blocks.0/attn/Mul_15' Status Message: /onnxruntime_src/onnxruntime/core/providers/cpu/math/element_wise_ops.h:560 void onnxruntime::BroadcastIterator::Append(ptrdiff_t, ptrdiff_t) axis == 1 || axis == largest was false. Attempting to broadcast an axis by a dimension other than 1. 2048 by 2814
Traceback (most recent call last):
File "/home/smickovskid/ai/F5-TTS-ONNX/Export_ONNX/F5_TTS/Export_F5.py", line 467, in <module>
noise = ort_session_B.run(
File "/home/smickovskid/miniconda3/envs/f5-tts/lib/python3.10/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 266, in run
return self._sess.run(output_names, input_feed, run_options)
onnxruntime.capi.onnxruntime_pybind11_state.RuntimeException: [ONNXRuntimeError] : 6 : RUNTIME_EXCEPTION : Non-zero status code returned while running Mul node. Name:'/f5_transformer/transformer_blocks.0/attn/Mul_15' Status Message: /onnxruntime_src/onnxruntime/core/providers/cpu/math/element_wise_ops.h:560 void onnxruntime::BroadcastIterator::Append(ptrdiff_t, ptrdiff_t) axis == 1 || axis == largest was false. Attempting to broadcast an axis by a dimension other than 1. 2048 by 2814
@smickovskid
Apologies for the delayed response. The main issue is that your audio input exceeds the maximum length defined in the exported ONNX model settings. Specifically, MAX_SIGNAL_LENGTH = 2048
(set at line 68 in Export_F5.py
), while your audio, after the STFT process, has a length of 2814. Please re-export all ONNX models with an appropriately larger value for MAX_SIGNAL_LENGTH
.
Hey @DakeQQ, sorry for the late response. Yeah that fixed it! Thanks for all the help.