Great work! I wonder how to ensure the consistency of input frame length and output waveform length? When I use GRID datasets to train and test and set the hyper parameters as follow:
T = 40
overlap = 10
mel_step_size = 160
mel_overlap = 40
img_size = 96
fps = 25,
Test results shows that the ground truth is 3 seconds while the generated waveforms are 7 seconds. How can I solve this problem? Looking forward to your reply!
Great work! I wonder how to ensure the consistency of input frame length and output waveform length? When I use GRID datasets to train and test and set the hyper parameters as follow: T = 40 overlap = 10 mel_step_size = 160 mel_overlap = 40 img_size = 96 fps = 25, Test results shows that the ground truth is 3 seconds while the generated waveforms are 7 seconds. How can I solve this problem? Looking forward to your reply!