archiki / ASR-Accent-Analysis

Analysis and investigating the confounding effect of accents in end-to-end Automatic Speech Recognition models.
MIT License
13 stars 5 forks source link
accent-adaptation accents analysis automatic-speech-recognition deepspeech end-to-end-learning explainability gradient-analysis interpretability mutual-information paper phones probes pytorch representation-similarity

Analyzing Confounding Effect of Accents in E-2-E ASR models

This repository contains code for our paper How Accents Confound: Probing for Accent Information in End-to-End Speech Recognition Systems, on understanding the confounding effect of accents in an end-to-end Automatic Speech Recognition (ASR) model: DeepSpeech2 through several probing/analysis techniques, which is going to appear in ACL 2020.

Requirements

Instructions

  1. Clone deepspeech.pytorch and checkout the commit id e73ccf6. This was the stable commit used in all our experiments.
  2. Use the docker file provided in this directory and build the docker image followed by running it via the bash entrypoint,use the commands below. This should be same as the dockerfile present in your folder deepspeech.pytorch, the instructions in the README.md of that folder have been modified.
    sudo docker build -t  deepspeech2.docker .
    sudo docker run -ti --gpus all -v `pwd`/data:/workspace/data --entrypoint=/bin/bash --net=host --ipc=host deepspeech2.docker
  3. Install all the requirements using pip install -r requirements.txt
  4. Clone this repository code inside the docker container in the directory /workspace/ and install the other requirements.
  5. Install the Mozilla Common Voice Dataset, TIMIT Dataset used in the experiments and the optional Librispeech Dataset which is used only for training purposes.
  6. Preparing Manifests: The data used in deepspeech.pytorch is required to be in .csv called manifests with two columns: path to .wav file, path to .txt file. The .wav file is the speech clip and the .txt files contain the transcript in upper case. For Librispeech, use the data/librispeech.py in deepspeech.pytorch. For the other datsets, use the files DeepSpeech/data/make_{MCV,timit}_manifest.py provided. The file corresponding to TIMIT works on the original folder structure whereas as for MCV, we need to provide a .txt file with entries of the format- file.mp3 : reference text.
  7. The additional and/or modified files can be found in DeepSpeech/ along with our trained model and Language Model (LM) used in DeepSpeech/models.

Reproducing Experiment Results

Citation

If you use this code in your work, please consider citing our paper:

@inproceedings{prasad-jyothi-2020-accents,
    title = "How Accents Confound: Probing for Accent Information in End-to-End Speech Recognition Systems",
    author = "Prasad, Archiki  and
      Jyothi, Preethi",
    booktitle = "Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics",
    month = jul,
    year = "2020",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://www.aclweb.org/anthology/2020.acl-main.345",
    pages = "3739--3753"}

Acknowledgements

This project uses code from deepspeech.pytorch.