Multi-Task Kaldi

The collection of scripts in this repository represent a template for training neural networks via Multi-Task Learning in Kaldi. This repo is heavily based on the existing Kaldi multilingual Babel example directory.

multi-task-kaldi allows similar functionality to the multilingual Babel scripts, but with more easily extendable code. Adding a new language with multi-task-kaldi is as easy as creating a new input_lang dir. Running multiple tasks on the same corpus is not possible in the multilingual Babel setup, but in multi-task-kaldi it is possible by creating a new input_task dir. The code here aims to be easily readable and extensible, and makes few assumptions about the kind of data you have and where it's located on disk.

To get started, multi-task-kaldi should be cloned and moved into the egs dir of your local version of the latest Kaldi branch.

If you're used to typical Kaldi egs, you should know that all the scripts here in utils / local / steps exist in this repo. That is, they do not link back to the wsj example. This was done to make custom changes to the scripts, making them more readable.

Creating the `input_task` dir

In order to run multi-task-kaldi, you need to make a new input_task dir. This is the only place you need to make changes for your new task (or new language).

This directory contains information about the location of your data, lexicon, language model.

Here is an example of the structure of my input_task directory for the task called my-task.

input_my-task/
├── lexicon_nosil.txt -> /data/my-task/lexicon/lexicon_nosil.txt
├── lexicon.txt -> /data/my-task/lexicon/lexicon.txt
├── task.arpabo -> /data/my-task/lm/task.arpabo
├── test_audio_path -> /data/my-task/audio/test_audio_path
├── train_audio_path -> /data/my-task/audio/train_audio_path
├── transcripts.test -> /data/my-task/audio/transcripts.test
└── transcripts.train -> /data/my-task/audio/transcripts.train

0 directories, 7 files

Most of these files are standard Kaldi format, and more detailed descriptions of them can be found on the official docs.

lexicon_nosil.txt // Standard Kaldi // phonetic dictionary without silence phonemes
lexicon.txt // Standard Kaldi // phonetic dictionary with silence phonemes
task.arpabo // Standard Kaldi // language model in ARPA back-off format
test_audio_path // Custom file! // one-line text file containing absolute path to dir of audio files (eg. WAV) for testing
train_audio_path // Custom file! // one-line text file containing absolute path to dir of audio files (eg. WAV) for training
transcripts.test // Custom file! // A typical Kaldi transcript file, but with only the test utterances
transcripts.train // Custom file! // A typical Kaldi transcript file, but with only the train utterances

Running the scripts

The scripts will name files and directories dynamically. You will define the name of your input data (ie. task or language) in the initial input_ dir, and then the rest of the generated dirs and files will be named accordingly. For instance, if you have input_your-task, then the GMM alignment stage will create data_your-task, plp_your-task and exp_your-task.

Force Align Training Data (GMM)

$ ./run_gmm.sh your-task test001

your-task should correspond exactly to input_your-task. In multilingual training, this will be input_lang1, input_lang2, etc. In monolingual Multi-Task Learning, this will be input_task1, input_task2, etc.
test001 is any character string, and is written to the name of the WER file: WER_nnet3_your-corpus_test001.txt

Format data from GMM --> DNN

$ ./utils/setup_multitask.sh to_dir from_dir "your-task1 your-task2 your-task3"

all nnet3 log files and experimental data will be written to to_dir (absolute path). This dir must exist already.
the output dirs from GMM alignment should exist at from_dir (absolute path)
the task names "your-task1 your-task2 your-task3" must correspond to input dir names as such: input_your-task1, input_your-task2, etc. However, do not include the initial input_ here.

Multi-Task Learning (DNN)

$ ./run_nnet3_multitask.sh "your-task1 your-task2" "gmm-typo1 gmm-typo2" "weight-task1,weight-task2" hidden-dim num-epochs main-dir

first argument is a space-delimited string of task names (must correspond to input_your-task1)
second argument is a space-delimited string of GMM model typologies. These are either "mono" or "tri", and determine whether you want to use monophone alignments or triphone alignments for each task.
third argument is comma-delimited list of weights for each task. Should be probably equal to or less than 1.0.
hidden-dim is the number of nodes in your hidden layer
num-epochs is num epochs for each task. This is not task-specific.
main-dir is the dir you moved your GMM alignments into. Above we used to_dir.

JRMeyer / multi-task-kaldi

readme

Multi-Task Kaldi

Creating the `input_task` dir

Running the scripts

Force Align Training Data (GMM)

Format data from GMM --> DNN

Multi-Task Learning (DNN)

JRMeyer / multi-task-kaldi

readme

Multi-Task Kaldi

Creating the input_task dir

Running the scripts

Force Align Training Data (GMM)

Format data from GMM --> DNN

Multi-Task Learning (DNN)

Creating the `input_task` dir