Clarification on Training Configuration for DeepFilterNet3 (dataset.cfg)

WoodieDudy commented 6 months ago

Hello @Rikorose,

I'm working on reproducing the training results for DeepFilterNet3 and have some questions regarding the dataset configuration to specify it in the dataset.cfg file. The paper (https://arxiv.org/abs/2305.08227) mentions the use of the DNS4 dataset, along with oversampled PTDB and VCTK for training.

I found a scripts/download_process_dns4.sh script for downloading DNS4 and, after running it, ended up with files similar to the following:

total 323G
-rw-r--r-- 1 root root   12K Mar 24 09:22 SLR26_TRAIN.hdf5
-rw-r--r-- 1 root root   12K Mar 24 09:22 SLR28_TRAIN.hdf5
-rw-r--r-- 1 root root  482M Mar 23 10:31 VocalSet_48kHz_mono_TRAIN.hdf5
drwxr-xr-x 2 root root  4.0K Mar 22 10:28 clean_fullband
drwxr-xr-x 3 root root  4.0K Mar 24 10:54 datasets_fullband
-rw-r--r-- 1 root root  3.0K Mar 24 09:22 dns4_blob.sh
-rw-r--r-- 1 root root   576 Mar 22 14:09 download.log
-rw-r--r-- 1 root root  752M Mar 21 21:23 emotional_speech_TRAIN.hdf5
-rw-r--r-- 1 root root   27G Mar 21 21:23 french_speech_TRAIN.hdf5
-rw-r--r-- 1 root root  113G Mar 22 15:06 german_speech_TRAIN.hdf5
-rw-r--r-- 1 root root   17G Mar 23 12:24 italian_speech_TRAIN.hdf5
-rw-r--r-- 1 root root  8.4G Mar 24 07:40 noise_fullband_TRAIN.hdf5
-rw-r--r-- 1 root root  783M Mar 24 07:45 noisy_testclips_VALID.hdf5
-rw-r--r-- 1 root root  119G Mar 24 01:06 read_speech_TRAIN.hdf5
-rw-r--r-- 1 root root  4.2G Mar 24 01:34 russian_speech_TRAIN.hdf5
-rw-r--r-- 1 root root   23G Mar 24 04:03 spanish_speech_TRAIN.hdf5
-rw-r--r-- 1 root root   11G Mar 24 05:04 vctk_wav48_silence_trimmed_TRAIN.hdf5

So I need to fill dataset.cfg with these downloaded files

{
  "train": [
    ["TRAIN_SET_SPEECH.hdf5",1.0],
    ["TRAIN_SET_NOISE.hdf5",1.0],
    ["TRAIN_SET_RIR.hdf5",1.0 ]
  ],
  "valid": [
    ["VALID_SET_SPEECH.hdf5",1.0],
    ["VALID_SET_NOISE.hdf5",1.0],
    ["VALID_SET_RIR.hdf5", 1.0]
  ],
  "test": [
    [ "TEST_SET_SPEECH.hdf5", 1.0],
    ["TEST_SET_NOISE.hdf5",1.0],
    ["TEST_SET_RIR.hdf5", 1.0]
   ]
}

From my understanding, all files with the suffix "_TRAIN" should be placed in the "train" section of the dataset.cfg. Should the SLR26_TRAIN.hdf5 and SLR28_TRAIN.hdf5 files be placed in the RIR category? Additionally, there's only one "_VALID" file; is this expected, and should it be placed in the "noise" section? Where can I find "speech" and "rir" files for "valid" key in dataset.cfg?

I could not find scripts for downloading PTDB and VCTK, so I downloaded them manually: PTDB
VCTK

The PTDB dataset includes audio files with kinda noise (laryngograph signal); should these be excluded, focusing only on clean speech?

.
├── FEMALE
|   ├── ...
|   ...
|
└── MALE
    ├── LAR (laryngograph signal)
    │   ├── M01 
    |   ... ├──lar_M01_si482.wav
    |       ├──...
    |       ...
    ├── MIC (microphone signal)
    │   ├── M01 
    |   ... ├──mic_M01_si483.wav
    |       ├──...
    |       ...
    └── REF (reference pitch trajectory)
        ├── M01 
        ... ├──ref_M01_si465.f0
            ├──...
            ...

The paper also mentions a VCTK/DEMAND test set.
Which looks like this

clean_testset_wav/
clean_trainset_wav/
noisy_testset_wav/
noisy_trainset_wav/
testset_txt/
trainset_txt/

I found it have both noisy and clean audio. How should these be properly passed into the prepare_data.py script to make .hdf5 files for dataset.cfg?

Thank you for your guidance.

WoodieDudy commented 4 months ago

I made a guide how to prepare dataset and train model
https://github.com/WoodieDudy/open-source-stuff/tree/main/DeepFilerNet/prepare_dataset

github-actions[bot] commented 1 month ago

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 7 days.

Rikorose / DeepFilterNet

Clarification on Training Configuration for DeepFilterNet3 (dataset.cfg) #531