SWivid / F5-TTS

Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"
https://arxiv.org/abs/2410.06885
MIT License
7.11k stars 847 forks source link

Cantonese needed! #37

Open huangkun1985 opened 1 month ago

huangkun1985 commented 1 month ago

Please vote for Cantonese!!!

SWivid commented 1 month ago

b( ̄▽ ̄)d 

wong813 commented 1 month ago

Yes Cantonese needed!!!

ringolam commented 3 weeks ago

Strongly request to have Cantonese language

elbartohub commented 3 weeks ago

Yes, one of the most powerful language on Earth

pebblehack commented 3 weeks ago

It's not an issue but- Cantonese would be neat.

indiejoseph commented 2 days ago

+1

chau9ho commented 2 days ago

I have fine tuned a Cantonese model with F5-TTS

Dataset Details

Training Configuration

{ "exp_name": "F5TTS_Base", "learning_rate": 7.5e-05, "batch_size_per_gpu": 12000, "batch_size_type": "frame", "max_samples": 64, "grad_accumulation_steps": 4, "max_grad_norm": 0, "epochs": 50, "num_warmup_updates": 2000, "save_per_updates": 20000, "last_per_steps": 500, "finetune": true, "file_checkpoint_train": "", "tokenizer_type": "char", "tokenizer_file": "", "mixed_precision": "fp16", "logger": "tensorboard" }

Questions

Character/Vocab Issues:

Why are some characters skipped despite being in vocab.txt? How to improve numeric character handling? Best way to enforce Cantonese pronunciation for all characters?

Training Concerns: Would more training steps help with pronunciation? Is the dataset quality affecting accent/pronunciation? How to reduce the Mandarin accent influence?

Looking for: Solutions for character skipping issues Methods to improve numeric character handling Community experience with similar issues