This github repository contains models, scripts and data splits from our paper accepted at Interspeech 2024, which can be found here
Source code for training different Supervised and Self Supervised models can be found under /src
/egs contains bash scripts to train models on the MyST and CSLU OGI Kids' datasets, as well as scripts to filter these datasets and obtain train test splits
Model | MyST test WER | Huggingface Link |
---|---|---|
Whisper tiny | 11.6 | model |
Whisper base | 10.4 | model |
Whisper small | 9.3 | model |
Whisper Medium | 8.9 | model |
Whisper Large | 13.0 | model |
Whisper Large v3 | 9.1 | model |
Canary | 9.2 | model |
Parakeet | 8.5 | model |
Wav2vec2.0 Large | 11.1 | model |
HuBERT Large | 11.3 | model |
WavLM Large | 10.4 | model |
Model | OGI test WER | Huggingface Link |
---|---|---|
Whisper tiny | 3.0 | model |
Whisper base | 2.3 | model |
Whisper small | 1.8 | model |
Whisper Medium | 1.5 | model |
Whisper Large | 1.7 | model |
Whisper Large v3 | 1.4 | model |
Canary | 1.5 | model |
Parakeet | 1.8 | model |
Wav2vec2.0 Large | 2.5 | model |
HuBERT Large | 2.5 | model |
WavLM Large | 1.8 | model |
Data Augmentation | Myst test WER | Huggingface Link |
---|---|---|
PP | 8.8 | model |
VTLP | 9.0 | model |
SP | 8.9 | model |
SA | 9.0 | model |
PEFT Method | MyST test WER | Huggingface Link |
---|---|---|
Enc | 9.2 | model |
Dec | 9.5 | model |
LoRA | 9.6 | model |
Prompt | 10.4 | model |
Prefix | 10.2 | model |
Adapter | 9.3 | model |
If you use this code in your research, please cite it as follows:
@inproceedings{fan24b_interspeech,
title = {Benchmarking Children's ASR with Supervised and Self-supervised Speech Foundation Models},
author = {Ruchao Fan and Natarajan {Balaji Shankar} and Abeer Alwan},
year = {2024},
booktitle = {Interspeech 2024},
pages = {5173--5177},
doi = {10.21437/Interspeech.2024-1353},
}