Closed kudanai closed 3 years ago
Hi, that is correct, in order to feed the mels to HiFiGAN or MelGAN you need to swap the last two axis. No other changes should be needed afaik
Have a look at the "vocoding" branch if you want, there I have some (ugly) code to predict with these vocoders.
Thank you for the response.
I just tested out your suggestion and it does indeed work with just the axis swap. I'm updating the diff here.
We've had very good results on TransformerTTS + HiFiGAN Sound Sample
(In retrospect, this issue would make more sense under the new Discussions feature)
diff --git a/meldataset.py b/meldataset.py
index 4502924..b72981b 100644
--- a/meldataset.py
+++ b/meldataset.py
@@ -142,10 +142,14 @@ class MelDataset(torch.utils.data.Dataset):
else:
mel = np.load(
os.path.join(self.base_mels_path, os.path.splitext(os.path.split(filename)[-1])[0] + '.npy'))
mel = torch.from_numpy(mel)
if len(mel.shape) < 3:
mel = mel.unsqueeze(0)
+ if not mel.shape[1] == 80:
+ mel = mel.transpose(1,2)
if self.split:
frames_per_seg = math.ceil(self.segment_size / self.hop_size)
Hi, cool, in what language is your sample? Did you use the phonemizer for text conversion? Not knowing the language my comment is probably invalid, but it seems a little flat, did you train with the pitch prediction too?
It's "Dhivehi" using the Thaana script. Unfortunately phonemizer support is lacking right now so I patched it to skip the phonemizer and pick up a raw charset from config instead link to fork here.
Turned off stress and breathing. All other settings are default. The flatness probably comes more from the dataset itself, although to a native speaker it isn't bad at all.
@kudanai why did you let the batch size of HifiGan equal to 1 ???
@kudanai why did you let the batch size of HifiGan equal to 1 ???
I'm not entirely sure. Please try the second patch first. It seems to be enough. On the first attempt to fix it I encountered some issues which appeared to be mitigated by setting the batch_size to 1
@kudanai can this be closed?
The direct output mels from TransformerTTS seem to be incompatible with input for HifiGAN. I was able to make it work by applying the following patch on HifiGAN (please ignore the prints for debug)
This appears to work. Just wanted confirm if this is a correct approach. Also thought it might be helpful to someone having the same issue.
**note: the value 80 is
config['mel_channels']