This repository contains code for our paper How Accents Confound: Probing for Accent Information in End-to-End Speech Recognition Systems, on understanding the confounding effect of accents in an end-to-end Automatic Speech Recognition (ASR) model: DeepSpeech2 through several probing/analysis techniques, which is going to appear in ACL 2020.
e73ccf6
. This was the stable commit used in all our experiments.README.md
of that folder have been modified.
sudo docker build -t deepspeech2.docker .
sudo docker run -ti --gpus all -v `pwd`/data:/workspace/data --entrypoint=/bin/bash --net=host --ipc=host deepspeech2.docker
pip install -r requirements.txt
/workspace/
and install the other requirements.path to .wav file, path to .txt file
. The .wav file is the speech clip and the .txt files contain the transcript in upper case. For Librispeech, use the data/librispeech.py
in deepspeech.pytorch. For the other datsets, use the files DeepSpeech/data/make_{MCV,timit}_manifest.py
provided. The file corresponding to TIMIT works on the original folder structure whereas as for MCV, we need to provide a .txt file with entries of the format- file.mp3 : reference text
.DeepSpeech/
along with our trained model and Language Model (LM) used in DeepSpeech/models
.Section 2.1, Table 1: This was obtained by testing the model using the following command and the appropriate manuscript:
cd deepspeech.pytorch/
python test.py --model-path ../Deepspeech/models/deepspeech_final.pth --test-manifest {accent manifest}.csv --cuda --decoder beam --alpha 2 --beta 0.4 --beam-width 128 --lm-path ../Deepspeech/models/4-gram.arpa
Section 3.1, Attribution Analysis: Code for all experiments in this section can be found in AttrbutionAnalysis.ipynb
.
The main requirements for this notebook include the gradient attributions calculated using Deepspeech/test_attr.py
and the frame-level alignments that can be derived from the time(s)-level alignments using gentle along with accent labels and reference transcripts. The paper contains attribution maps for the sentence: 'The burning fire had been extinguished.', the audio files for the various accents can be found in the folder audioFiles
.
Section 3.2, Information Mixing Analysis: Datapoints for the figures showing phone focus and neighbour analysis can be found in Contribution.ipynb
. Deepspeech/test_contr.py
is used to calculate the gradient contributions given by equation (1).
Section 4, Mutual Information Experiments: Data for all experiments involving mutual information can be generated using MI.ipynb
which uses averaged phone representations which can be generated by using frame-level alignments and averaging all the consecutive frames corresponding to a particular phone.
Section 5, Classifier-driven Analysis: All the code files relevant to the accent probe/classifiers and phone probe/classifiers can be found in the folders AccentProbe/
and PhoneProbes/
respectively. These probes are trained on entire represenations and frame-level (and average) representations respectively.
If you use this code in your work, please consider citing our paper:
@inproceedings{prasad-jyothi-2020-accents,
title = "How Accents Confound: Probing for Accent Information in End-to-End Speech Recognition Systems",
author = "Prasad, Archiki and
Jyothi, Preethi",
booktitle = "Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics",
month = jul,
year = "2020",
address = "Online",
publisher = "Association for Computational Linguistics",
url = "https://www.aclweb.org/anthology/2020.acl-main.345",
pages = "3739--3753"}
This project uses code from deepspeech.pytorch.