Kahsolt / TransTacoS-RetuneGAN

A toy-like Text-to-Speech for Chinese/Mandarin synthesize, inspired by Tacotron & FastSpeech2 & RefineGAN.
MIT License
15 stars 4 forks source link

TransTacoS-RetuneGAN

A lighter-weight (perhaps!) Text-to-Speech for Chinese/Mandarin synthesize, inspired by Tacotron & FastSpeech2 & RefineGAN.
It is also my shitty graduation design project, just a toy, so lower your expectations :)

Quick Start

setup

Since TransTacoS is implemented in tensorflow while RefineGAN in torch respectively, you could separate them by creating virtual envs, but they are likely not to conflict, thus you could try to put all these together:

dataset

train

deploy

Model Architecture

TransTacoS

align

spec

What I actually did:

Frankly speaking, TransTacoS didn't improve any thing profoundly from Tacotron, but I just found that shallower network leads to lower mel_loss, so maybe simple embed+decoder is already enough :(

Tips of ideas to try or failed:

RetuneGAN

gen_wav_cmp

What I actually did:

Oh my dude, it's really a biggy feng-he monster :(

Tips of ideas to try or failed:

Acknowledgements

Codes referred to:

Ideas plagiarized from:

code release kept under the MIT license, greatest thanks all the authors!! :)


by Armit 2022/02/15 2022/05/25