brentspell / hifi-gan-bwe

Unofficial implementation of HiFi-GAN+ from the paper "Bandwidth Extension is All You Need" by Su, et al.
MIT License
201 stars 26 forks source link

I would like to know about the training data #2

Closed chomeyama closed 2 years ago

chomeyama commented 2 years ago

Hello. Your implementation is very nice and I would like to use your pre-trained model as a baseline model in my research. I want to use the same training data to make a fair comparison but I cannot find the information about it. So, if possible, would you please publish the division of the VCTK dataset? Or it is alright to send me the information via email or something else. Best regards.

brentspell commented 2 years ago

Hi Reo,

Feel free to use my pretrained model and any code you find here. If you publish, please cite the original paper's authors using the citation at the bottom of README.md.

I used the same train/test division of the VCTK dataset that was described in the original paper. Specifically, for training I used the first 99 speakers (sorted alphabetically by speaker name) in the dataset, and the remaining 10 speakers (p345, p347, p351, p360, p361, p362, p363, p364, p374, and p376) for validation/evaluation. The code that does this split can be found here.

If you have any other questions, let me know.

Sincerely, Brent

chomeyama commented 2 years ago

Thank you for your answer!

I have a few additional questions. ・What version of VCTK did you use? ・If there are "mic1" and "mic2", which did you use? ・Did you apply any pre-processing such as normalization or high-pass filtering?

Best regards, Reo

brentspell commented 2 years ago

I trained the models on VCTK version 0.80, which only has data in the "mic1" configuration. I did no preprocessing on the raw audio, although I did do some gain augmentation during training.

chomeyama commented 2 years ago

I see. Unfortunately, I think VCTK0.80 is not publicly available now...

Please see https://pytorch.org/audio/0.8.0/datasets.html#vctk

This dataset is no longer publicly available. Please use VCTK_092

Also, since some recording errors have been corrected in version 0.92 and the "mic2" has less low frequency noise than the "mic1", I think it would be better to use the "mic2" of version 0.92. So I will train HiFiGAN+ model by myself using your implementation on the mic2 of VCTK version 0.92.

However, if possible, it would be helpful if you could also publish models trained on the currently available VCTK corpus, as they are easier to use as a baseline.

Anyway, thank you for your answers and providing useful codes!

manosplitsis commented 1 year ago

Hi. Regarding the DNS Challenge data, there is little information in the readme about the version of the dataset, or the subset of data that needs to be downloaded. I guess this is also because there is no explicit information in the paper for it. When training your models, which version of the DNS Challenge did you use? By looking at the datasets.py code I see that we only need the "noise_fullband" subset of the data, which is still 58 GB. Do we need the whole "noise_fullband" subset to recreate your training process?

brentspell commented 1 year ago

Hi manosplitsis,

You are correct that the original authors did not cite the DNS Challenge dataset version. For my own experiments, I used the dataset from ICASSP 2022 (DNS4). You are also correct that I only use the noise_fullband subset, and I would recommend using the whole thing. While 58GB (~180 hours at 48kHz PCM-16 mono) may sound like a lot of data, in terms of datasets for audio deep learning, this is not very large at all.

That said, you may also want to reach out to the original authors of the paper for more information about their training process, if you are trying to replicate their results. I can only speak to the experiments I have run, and I am not affiliated with the authors.