Closed quizt35 closed 1 month ago
Hello and thanks for the question.
The demo files are all rescaled to [-1, 1] for playback (see website footnote), which is not how the AEC data was setup for training. A previous github issue here noted this issue as well and rescaled d = d / 10
.
If you want to replicate my results fully, I would recommend downloading the data from the AEC challenge and using that.
Thanks for your reply. By setting a scale, I can get a more reasonable result, but there are still some minor issues. As shown in the figure below, there are similar impulses in the first few seconds of the speech. I'm wondering if this is due to the window or the format of the original speech. I will also follow your suggestion to test on the AEC Challenge datasets.
Additionally, should the URL for JAX in the ‘ReadMe - GPU Setup’ be https://storage.googleapis.com/jax-releases/jax_cuda_releases.html?
Hello! Thanks for sharing the pre-trained models and demos. I would like to replicate the demo results using a pretrained model. I used the data from the first row of the double-talk and converted the mp3 to wav format (single channel, 16000Hz, 16bit) for convenience. Based on the speech titles downloaded from the demo page, I selected the same pkl file to process the original speech. However, there is a significant difference between the spectrograms from the demo page and those generated using the pre-trained model. I've checked every steps and can't find the reason. Could you help me understand why?
model tag: v1.0.1 This code i used is below:
Looking forward to hearing from you, thanks!