JRMeyer / multi-task-kaldi

An example directory for running Multi-Task Learning training on Kaldi neural networks. In Kaldi-speak, this is an egs dir for nnet3 training.
Apache License 2.0
54 stars 16 forks source link

Multi-Task Kaldi

The collection of scripts in this repository represent a template for training neural networks via Multi-Task Learning in Kaldi. This repo is heavily based on the existing Kaldi multilingual Babel example directory.

multi-task-kaldi allows similar functionality to the multilingual Babel scripts, but with more easily extendable code. Adding a new language with multi-task-kaldi is as easy as creating a new input_lang dir. Running multiple tasks on the same corpus is not possible in the multilingual Babel setup, but in multi-task-kaldi it is possible by creating a new input_task dir. The code here aims to be easily readable and extensible, and makes few assumptions about the kind of data you have and where it's located on disk.

To get started, multi-task-kaldi should be cloned and moved into the egs dir of your local version of the latest Kaldi branch.

If you're used to typical Kaldi egs, you should know that all the scripts here in utils / local / steps exist in this repo. That is, they do not link back to the wsj example. This was done to make custom changes to the scripts, making them more readable.

Creating the input_task dir

In order to run multi-task-kaldi, you need to make a new input_task dir. This is the only place you need to make changes for your new task (or new language).

This directory contains information about the location of your data, lexicon, language model.

Here is an example of the structure of my input_task directory for the task called my-task.

input_my-task/
├── lexicon_nosil.txt -> /data/my-task/lexicon/lexicon_nosil.txt
├── lexicon.txt -> /data/my-task/lexicon/lexicon.txt
├── task.arpabo -> /data/my-task/lm/task.arpabo
├── test_audio_path -> /data/my-task/audio/test_audio_path
├── train_audio_path -> /data/my-task/audio/train_audio_path
├── transcripts.test -> /data/my-task/audio/transcripts.test
└── transcripts.train -> /data/my-task/audio/transcripts.train

0 directories, 7 files

Most of these files are standard Kaldi format, and more detailed descriptions of them can be found on the official docs.

Running the scripts

The scripts will name files and directories dynamically. You will define the name of your input data (ie. task or language) in the initial input_ dir, and then the rest of the generated dirs and files will be named accordingly. For instance, if you have input_your-task, then the GMM alignment stage will create data_your-task, plp_your-task and exp_your-task.

Force Align Training Data (GMM)

$ ./run_gmm.sh your-task test001

Format data from GMM --> DNN

$ ./utils/setup_multitask.sh to_dir from_dir "your-task1 your-task2 your-task3"

Multi-Task Learning (DNN)

$ ./run_nnet3_multitask.sh "your-task1 your-task2" "gmm-typo1 gmm-typo2" "weight-task1,weight-task2" hidden-dim num-epochs main-dir