Question about the effect of vc

There are 2 major preprocessing that the authors used in the original paper:

Information perturbation using Parselmouth

To this end, we propose to perturb the information included in input waveform x by using three functions that are 1. formant shifting (fs), 2. pitch randomization (pr), and 3. random frequency shaping using a parametric equalizer (peq)
Dataset filtering

The speakers of train-clean-360 were included to the training set only when the total length of speech samples exceeds 15 minutes.

For process 2, I've not done any work considering that, so those filtering might help. For process 1, where the warning "PraatWarning: There were no voiced segments found." comes from, the problem is quite complex. During the process(with my implementation), many different praat and parselmouth errors popped out and I couldn't really find out what the exact reasons were. As an example, for "PraatWarning: There were no voiced segments found.", some wavfiles definitely had human voice, but throwed such warning during perturbation :( So I ignored and forced to train with the warning, but it might help if you remove audio files throwing those warnings.

dhchoi99 / NANSY

Question about the effect of vc #10