NOTE: This repository is still under development. Some features may not be fully functional.
MiniSUPERB is a proxy dataset for SUPERB and the SUPERB Challenge. It provides a simplified and accessible way to evaluate SSL speech models.
The following diagram provides an intuitive illustration of how MiniSUPERB accelerates the evaluation process for SSL speech models:
The figure shows how our results approximate the model rankings of the SUPERB Challenge:
For more details, please refer to the original paper.
The project was developed using the following environments.
Env | versions |
---|---|
os | ubuntu-20.04 |
python | 3.10 |
pytorch | 1.12.1 |
MiniSUPERB supports four downstream tasks:
The following upstream models are supported: | Models | Upstream Model Name | Paper |
---|---|---|---|
WavLM | wavlm_base, wavlm_base_plus, wavlm_large | arxiv | |
HuBERT | hubert_base, hubert_large_ll60k | arxiv | |
Wav2Vec 2.0 | wav2vec2, wav2vec2_large_ll60k | arxiv | |
Modified-CPC | modified_cpc | arxiv | |
TERA | tera | arxiv | |
DeCoAR 2.0 | decoar2 | arxiv | |
Filter Bank | fbank, fbank_no_cmvn (used for SID) |
Download [librispeech_finetuning.tgz] (https://github.com/facebookresearch/libri-light/blob/main/data_preparation/README.md) and dev-clean, and test-clean from LibriSpeech.
Unzip and check the prepared file structure
DataStorage
└── LibriSpeech/
├── librispeech_finetuning/
├── dev-clean/
└── test-clean/
Download dataset from Voxceleb1 and unzip them.
voxceleb1_root="DataStorage/VoxCeleb1/"
mkdir -p $voxceleb1_root/dev
mkdir -p $voxceleb1_root/test
# prepare dev
cd $voxceleb1_root/dev/
wget https://thor.robots.ox.ac.uk/~vgg/data/voxceleb/vox1a/vox1_dev_wav_partaa
wget https://thor.robots.ox.ac.uk/~vgg/data/voxceleb/vox1a/vox1_dev_wav_partab
wget https://thor.robots.ox.ac.uk/~vgg/data/voxceleb/vox1a/vox1_dev_wav_partac
wget https://thor.robots.ox.ac.uk/~vgg/data/voxceleb/vox1a/vox1_dev_wav_partad
cat vox1_dev* > vox1_dev_wav.zip
unzip vox1_dev_wav.zip
# prepare test
cd $voxceleb1_root/test/
wget https://thor.robots.ox.ac.uk/~vgg/data/voxceleb/vox1a/vox1_test_wav.zip
unzip vox1_test_wav.zip
Check prepared file structure
DataStorage
└── Voxceleb1/
├── dev/
│ └── wav/
│ └──Speaker id folders
└── test/
└── wav/
└──Speaker id folders
Download Voicebank-DEMAND dataset prepared by s3prl
wget http://140.112.21.28:9000/noisy-vctk-16k.zip
unzip noisy-vctk-16k.zip
Check the unzipped voicebank directory structure
DataStorage
└── noisy-vctk-16k/
├── clean_testset_wav_16k/
├── clean_trainset_28spk_wav_16k/
├── noisy_testset_wav_16k/
├── noisy_trainset_28spk_wav_16k/
├── testset_txt/
└── trainset_28spk_txt/
Simulate Libri2Mix data for source separation. For source separation, we only need 16kHz and min condition. Make sure that SoX is installed on your machine
# Download the script and simulate Libri2Mix dataset
git clone https://github.com/s3prl/LibriMix.git
cd LibriMix
./generate_librimix_ss.sh DataStorage
Check the unzipped voicebank directory structure
DataStorage
└── Libri2Mix/
└── wav16k/
└── min/
├── train-100/
├── dev/
├── test/
└── metadata/
Start a new downstream training experiment with the following command:
cd minisuperb
# To evaluate a model on ASR:
bash asr.sh UpstreamModelName DataStorage
# To evaluate a model on SID:
bash sid.sh UpstreamModelName DataStorage
# SE, SS are still under development
# To evaluate a model on SE:
bash se.sh UpstreamModelName DataStorage
# To evaluate a model on SS):
bash ss.sh UpstreamModelName DataStorage
Install sox on your OS
For Linux :
conda install -c conda-forge sox
pip install -e ".[all]"
1. Support for custom upstream models
2. Evaluation Scripts for Speech Enhancement (SE) and Source Separation (SS)
3. Pipeline to calculate MiniSUPERB score for custom SSL models.
The majority of this project is licensed under the Apache License version 2.0, however all the files authored by Facebook, Inc. (which have explicit copyright statement on the top) are licensed under CC-BY-NC.