Xiaobin-Rong / gtcrn

The official implementation of GTCRN, an ultra-lite speech enhancement model.
MIT License
197 stars 33 forks source link

Query on Sampling Rate Adaptation #41

Open 12oneway opened 3 hours ago

12oneway commented 3 hours ago

Thank you for your open-source contribution; it is truly an exceptional piece of work. I have a query regarding the audio processing. Your model is designed to handle audio with a 16kHz sampling rate, but the VCTK-DEMAND test set is recorded at 48kHz. I was wondering if you downsampled the VCTK-DEMAND test set to 16kHz before conducting your tests?If that's the case, I noticed that in the experimental results comparison, DeepFilterNet was tested on the original 48kHz VCTK-DEMAND test set. I'm curious if this comparison is still valid given the difference in sampling rates. Could you please clarify this point?

Xiaobin-Rong commented 2 hours ago

I downsampled the VCTK-DEMAND test set to 16kHz before the tests. As the PESQ metric is conducted on 16kHz, the comparison is still valid although DeepFilterNet is tested on 48kHz audio (the enhanced audio will be downsampled to 16kHz before computing PESQ).