MiniSUPERB

NOTE: This repository is still under development. Some features may not be fully functional.

MiniSUPERB is a proxy dataset for SUPERB and the SUPERB Challenge. It provides a simplified and accessible way to evaluate SSL speech models.

The following diagram provides an intuitive illustration of how MiniSUPERB accelerates the evaluation process for SSL speech models: Evaluation framework compairson

The figure shows how our results approximate the model rankings of the SUPERB Challenge:

For more details, please refer to the original paper.

Environment compatibilities

The project was developed using the following environments.

Env	versions
os	`ubuntu-20.04`
python	`3.10`
pytorch	`1.12.1`

Introduction and Usages

MiniSUPERB supports four downstream tasks:

Automatic Speech Recognition (ASR)
Speaker Idendification (SID)
Speech Enhancement (SE)
Source Separation (SS)

The following upstream models are supported:	Models	Upstream Model Name
WavLM	wavlm_base, wavlm_base_plus, wavlm_large	arxiv
HuBERT	hubert_base, hubert_large_ll60k	arxiv
Wav2Vec 2.0	wav2vec2, wav2vec2_large_ll60k	arxiv
Modified-CPC	modified_cpc	arxiv
TERA	tera	arxiv
DeCoAR 2.0	decoar2	arxiv
Filter Bank	fbank, fbank_no_cmvn (used for SID)

Usage

Prepare data

ASR

Download [librispeech_finetuning.tgz] (https://github.com/facebookresearch/libri-light/blob/main/data_preparation/README.md) and dev-clean, and test-clean from LibriSpeech.

Unzip and check the prepared file structure

DataStorage
└── LibriSpeech/
    ├── librispeech_finetuning/
    ├── dev-clean/
    └── test-clean/

SID

Download dataset from Voxceleb1 and unzip them.

voxceleb1_root="DataStorage/VoxCeleb1/"
mkdir -p $voxceleb1_root/dev
mkdir -p $voxceleb1_root/test

# prepare dev
cd $voxceleb1_root/dev/
wget https://thor.robots.ox.ac.uk/~vgg/data/voxceleb/vox1a/vox1_dev_wav_partaa
wget https://thor.robots.ox.ac.uk/~vgg/data/voxceleb/vox1a/vox1_dev_wav_partab
wget https://thor.robots.ox.ac.uk/~vgg/data/voxceleb/vox1a/vox1_dev_wav_partac
wget https://thor.robots.ox.ac.uk/~vgg/data/voxceleb/vox1a/vox1_dev_wav_partad
cat vox1_dev* > vox1_dev_wav.zip
unzip vox1_dev_wav.zip

# prepare test
cd $voxceleb1_root/test/
wget https://thor.robots.ox.ac.uk/~vgg/data/voxceleb/vox1a/vox1_test_wav.zip
unzip vox1_test_wav.zip

Check prepared file structure

DataStorage
└── Voxceleb1/
    ├── dev/
    │   └── wav/
    │       └──Speaker id folders
    └── test/
        └── wav/
            └──Speaker id folders

SE

Download Voicebank-DEMAND dataset prepared by s3prl

wget http://140.112.21.28:9000/noisy-vctk-16k.zip
unzip noisy-vctk-16k.zip

Check the unzipped voicebank directory structure

DataStorage
    └── noisy-vctk-16k/
        ├── clean_testset_wav_16k/
        ├── clean_trainset_28spk_wav_16k/
        ├── noisy_testset_wav_16k/
        ├── noisy_trainset_28spk_wav_16k/
        ├── testset_txt/
        └── trainset_28spk_txt/

SS

Simulate Libri2Mix data for source separation. For source separation, we only need 16kHz and min condition. Make sure that SoX is installed on your machine
```
# Download the script and simulate Libri2Mix dataset
git clone https://github.com/s3prl/LibriMix.git
cd LibriMix 
./generate_librimix_ss.sh DataStorage
```

Check the unzipped voicebank directory structure

DataStorage
    └── Libri2Mix/
        └── wav16k/
            └── min/
                ├── train-100/
                ├── dev/
                ├── test/
                └── metadata/

SSL Model Evaluation

Start a new downstream training experiment with the following command:

cd minisuperb

# To evaluate a model on ASR:
bash asr.sh UpstreamModelName DataStorage

# To evaluate a model on SID:
bash sid.sh UpstreamModelName DataStorage

# SE, SS are still under development
# To evaluate a model on SE:
bash se.sh UpstreamModelName DataStorage

# To evaluate a model on SS):
bash ss.sh UpstreamModelName DataStorage

Installation

Install sox on your OS

For Linux :
```
conda install -c conda-forge sox
```
Install dependencies pip install -e ".[all]"

Features Under Development

1. Support for custom upstream models 
2. Evaluation Scripts for Speech Enhancement (SE) and Source Separation (SS)
3. Pipeline to calculate MiniSUPERB score for custom SSL models.

License

The majority of this project is licensed under the Apache License version 2.0, however all the files authored by Facebook, Inc. (which have explicit copyright statement on the top) are licensed under CC-BY-NC.

Comet0322 / MiniSUPERB

readme