Discrepencies between unifiedvoice2 and tortoise-tts

152334H / DL-Art-School

TorToiSe fine-tuning with DLAS

GNU Affero General Public License v3.0

205 stars 86 forks source link

Open 152334H opened 1 year ago

152334H commented 1 year ago

Two things I have discovered so far:

the wav_lengths are supposed to be multiplied by self.mel_length_compression
the things returned on return_latent are supposed to be subscripted with -2, not -1

I might just grab the definition from tortoise-tts instead.

FurkanGozukara commented 1 year ago

what are the optimal wav_lengths in training dataset?

like between 8 sec and 15 sec?