Closed wblgers closed 1 year ago
It's extracted from ParselMouth, and then upsampled to audio length.
It's extracted from ParselMouth, and then upsampled to audio length.
Oh, let me clarify my question as below: On line 75 of tools/diffusion/inference.py
if pitches is None:
pitches = self.pitch_extractor(audio, sr, pad_to=mel_len).float()
pad_to-mel_len means the 0 pitches are removed and linear interpolated. Is there the same preprocess during the training of nsf_hifigan?
It depends on whether you enable keep zeros in pitch extractor or not...
It depends on whether you enable keep zeros in pitch extractor or not...
Oh, I made a mistake. Yes, It depends on keep_zeros when construct pitch_extractor.
keep_zeros is generally better and thus we enabled it.
Thanks for your explaination!
Hi,
Thanks for sharing your work. I want to figure out the type of input pitches during the training of nsf_hifigan. It's continuouse pitch or the raw pitch extracted from ParselMouth.
Thanks!