castorini / howl

Wake word detection modeling toolkit for Firefox Voice, supporting open datasets like Speech Commands and Common Voice.
Mozilla Public License 2.0
199 stars 30 forks source link

Streamline preprocessing pipeline #17

Closed daemon closed 4 years ago

daemon commented 4 years ago

Data preprocessing is currently split into multiple steps, i.e.,

  1. Download the datasets (where?).
  2. Run run.preprocess_dataset.
  3. Write the corresponding *.lab files using run.export_mfa.
  4. Download Montreal Forced Aligner (MFA) and the corresponding CMU phonetic dictionary.
  5. Run MFA (mfa_align) over the speech corpus.
  6. Convert the output TextGrids to our jsonl format (run.attach_mfa_alignment).

We should make this process easier and document it somewhere.