clarinsi / parlaspeech

Code for bootstrapping ASR datasets from parliamentary recordings and transcripts
Apache License 2.0
4 stars 1 forks source link

ParlaSpeech data preparation procedure

This repository demonstrates the procedure and utilities used to automatically process large amounts of speech data in order to create a corpus which can be used to train models for speech processing, for example in automatic speech recognition.

The examples used here are based on the corpus of croatian parliamentary speech distributed using this link: http://hdl.handle.net/11356/1494

Authors

Citation

The contents of this repository is described in the paper:

TODO

Description

All the details are described in the tutorial notebook.