Open Mustaphajudi opened 6 hours ago
Audio quality (naturalness, clarity)?
Computational cost (training time, inference speed, memory usage)?
Model size?
Ease of use/setup?
If you are interested with BigVGAN training and further questions, @ZhikangNiu might help.
Thought just mel_spec_type="bigvgan"
passed in is fine for training.
Audio quality (naturalness, clarity)? Computational cost (training time, inference speed, memory usage)? Model size? Ease of use/setup?
- BigVGAN slightly better in clarity, Vocos slightly better in naturalness.
- We use pretrained Vocoder. The vocoder training is separate with TTS model training.
- Same as 2.
- Vocos is currently easier and has smaller model size (refer to the params of vocoder).
If you are interested with BigVGAN training and further questions, @ZhikangNiu might help. Thought just
mel_spec_type="bigvgan"
passed in is fine for training.
Ok waiting @ZhikangNiu for more informations about fine tunning f5 with bigvgan. @SWivid for inference,i change mel_spec_type = "vocos" to bigvgan in utils_infer.py and in def load_vocoder(vocoder_name="bigvgan", is_local=False, local_path="", device=device): but i got error,notice i downloaded the f5 bigvgan checkpoint,maybe i miss something or what?
here is the error:
(venv) C:\newtts\F5-TTS>f5-tts_infer-gradio
You need to follow the README to init submodule and change the BigVGAN source code.
Traceback (most recent call last):
File "
You need to follow the README to init submodule and change the BigVGAN source code.
As mentioned in error output, need to check the readme copy and paste the code to corresponding place
You need to follow the README to init submodule and change the BigVGAN source code.
As mentioned in error output, need to check the readme copy and paste the code to corresponding place
i do all the steps mentioned in readme but same error:```bash git clone https://github.com/JarodMica/F5-TTS.git cd F5-TTS
py -3.11 -m venv venv venv\Scripts\activate
pip install -e .
If you initialize submodule, you should add the following code at the beginning of `src/third_party/BigVGAN/bigvgan.py`.
```python
import os
import sys
sys.path.append(os.path.dirname(os.path.abspath(__file__)))
load_vocoder(vocoder_name="bigvgan", is_local=False, local_path="", device=device): but i got error,notice i downloaded the f5
so have you pass in the path of local bigvgan dir and turn on is_local
to True?
you may also comment out https://github.com/SWivid/F5-TTS/blob/ab2ad3b005ea839ab698493a819bde909761d96e/src/f5_tts/infer/utils_infer.py#L117-L120 and just put from third_party.BigVGAN import bigvgan
to see what specific problem encountered while importing
so have you pass in the path of local bigvgan dir and turn on
is_local
to True?
here is updated function code:
def load_vocoder(vocoder_name="bigvgan", is_local=True, local_path="C:/newtts/F5-TTS/ckpts/F5TTS_Base_bigvgan/model_1250000.pt", device=device): if vocoder_name == "vocos": if is_local: print(f"Load vocos from local path {local_path}") repo_id = "charactr/vocos-mel-24khz" revision = None config_path = hf_hub_download(repo_id=repo_id, cache_dir=local_path, filename="config.yaml", revision=revision) model_path = hf_hub_download(repo_id=repo_id, cache_dir=local_path, filename="pytorch_model.bin", revision=revision) vocoder = Vocos.from_hparams(config_path=config_path) state_dict = torch.load(model_path, map_location="cpu") vocoder.load_state_dict(state_dict) vocoder = vocoder.eval().to(device) else: print("Download Vocos from huggingface charactr/vocos-mel-24khz") vocoder = Vocos.from_pretrained("charactr/vocos-mel-24khz").to(device) elif vocoder_name == "bigvgan":
from third_party.BigVGAN import bigvgan
# except ImportError:
# print("You need to follow the README to init submodule and change the BigVGAN source code.")
if is_local:
"""download from https://huggingface.co/nvidia/bigvgan_v2_24khz_100band_256x/tree/main"""
vocoder = bigvgan.BigVGAN.from_pretrained(local_path, use_cuda_kernel=False)
else:
vocoder = bigvgan.BigVGAN.from_pretrained("nvidia/bigvgan_v2_24khz_100band_256x", use_cuda_kernel=False)
vocoder.remove_weight_norm()
vocoder = vocoder.eval().to(device)
return vocoder
and here the error:
(venv) C:\newtts\F5-TTS>f5-tts_infer-gradio
Traceback (most recent call last):
File "
(venv) C:\newtts\F5-TTS>
def load_vocoder(vocoder_name="bigvgan", is_local=True, local_path="C:/newtts/F5-TTS/ckpts/F5TTS_Base_bigvgan/model_1250000.pt", device=device):
the local_path for load_vocoder is the path of vocoder
if is_local: """download from https://huggingface.co/nvidia/bigvgan_v2_24khz_100band_256x/tree/main"""
so need to be set as "xxxxx/xxxx/bigvgan_v2_24khz_100band_256x/"
so i may got you wrong, you meant downloaded the f5 bigvgan checkpoint
the tts ckpt rather than vocoder, and are able to directly connect huggingface to pull the vocoder
then could just simply use the original load_vocoder(vocoder_name="bigvgan", is_local=False, local_path="", device=device)
and see how is the output error
from third_party.BigVGAN import bigvgan
still same error,mybe BigVGAN need diffrent python version less than 3.11 ? i run bigvgan gradio fine it work,but when i try to use it via f5 inference it show me the error of : File "C:\newtts\F5-TTS\src\f5_tts\infer\utils_infer.py", line 109, in load_vocoder from third_party.BigVGAN import bigvgan File "C:\newtts\F5-TTS\src\third_party\BigVGAN\bigvgan.py", line 19, in import activations ModuleNotFoundError: No module named 'activations'
maybe you could provide the files you used for us in a zip?
ModuleNotFoundError: No module named 'activations'
how is your bigvgan.py
e.g.
Checks
Question details
Hi @SWivid ,
I'm trying to fine-tune F5-TTS using the provided BigVGAN checkpoint and vocoder. I've followed the instructions in the README regarding setting up the BigVGAN submodule, but I'm unsure about the specific code modifications needed for both fine-tuning and inference.
Could you please provide more detailed guidance on the following:
What changes are required in the training scripts (train.py, finetune_cli.py) to use the BigVGAN vocoder and a BigVGAN-trained checkpoint? I'm particularly interested in how to correctly configure the mel spectrogram generation and handle the data type (FP32) requirements of BigVGAN. For example, should I pass mel_spec_type="bigvgan" to both the CFM model and the Trainer?
Are there any adjustments needed in the model definition files (cfm.py, modules.py) for BigVGAN compatibility during training?
Similarly, what changes are necessary in the inference scripts (infer_cli.py, infer_gradio.py, and utils_infer.py) to use BigVGAN for audio generation after fine-tuning?
Could you also elaborate on the advantages and disadvantages of using BigVGAN compared to Vocos for F5-TTS? For instance, are there differences in terms of:
Audio quality (naturalness, clarity)?
Computational cost (training time, inference speed, memory usage)?
Model size?
Ease of use/setup?