Hi
I want to measure the quality of the synthetic audio with metric that need reference audio, like pesq, mcd etc.
I followed the preprocess code of melgan (librosa.load => trim 20db => mel), i find that some trimed original audio has difference length with the synthetic audio which make the pesq computation failed. Why does generated wavforms have difference length with the original waveform (trimed)
Hi I want to measure the quality of the synthetic audio with metric that need reference audio, like pesq, mcd etc. I followed the preprocess code of melgan (librosa.load => trim 20db => mel), i find that some trimed original audio has difference length with the synthetic audio which make the pesq computation failed. Why does generated wavforms have difference length with the original waveform (trimed)