Open 152334H opened 1 year ago
Two things I have discovered so far:
wav_lengths
self.mel_length_compression
return_latent
-2
-1
I might just grab the definition from tortoise-tts instead.
what are the optimal wav_lengths in training dataset?
like between 8 sec and 15 sec?
Two things I have discovered so far:
wav_lengths
are supposed to be multiplied byself.mel_length_compression
return_latent
are supposed to be subscripted with-2
, not-1
I might just grab the definition from tortoise-tts instead.