alumae / kaldi-offline-transcriber

Offline transcription system for Estonian using Kaldi
Other
226 stars 57 forks source link
speech-recognition

Kaldi Offline Transcriber

Updates

2022-05-26

2021-06-15

2018-10-31

2018-09-12

2018-08-31

2018-08-21

2018-08-08

2017-05-29

2017-05-02

2017-02-13

2015-12-29

2015-05-14

2015-03-11

2014-12-17

2014-12-04

2014-10-24

2014-10-23

2014-08-03

Introduction

This is an offline transcription system for Estonian based on Kaldi (https://github.com/kaldi-asr/kaldi).

The system is targetted to users who have no speech research background but who want to transcribe long audio recordings using automatic speech recognition.

Much of the code is based on the training and testing recipes that come with Kaldi.

The system performs:

Trancription is performed in roughly 0.6x realtime on a 10 year old server, using one CPU. E.g., transcribing a radio inteview of length 8:23 takes about 5 minutes.

Memory requirements: during most of the work, less than 1 GB of memory is used.

Requirements

Server

Server running Linux is needed. The system is tested on Debian 'testing', but any modern distro should do.

Memory requirements

Remarks

If you plan to process many recordings in parallel, we recommend to turn off hyperthreading in server BIOS. This reduces the number of (virtual) cores by half, but should make processing faster, if you won't run more than N processes in parallel, where N is the number of physical cores.

It is recommended (but not needed) to create a decicated user account for the transcription work. In the following we assume the user is speech, with a home directory /home/speech.

Installation

See Dockerfile on how to install all the required components.

Usage

Put a speech file under src-audio. Many file types (wav, mp3, ogg, mpg, m4a) are supported. E.g:

cd src-audio
wget http://media.kuku.ee/intervjuu/intervjuu201306211256.mp3
cd ..

To run the transcription pipeline, execute make build/output/<filename>.txt where filename matches the name of the audio file in src-audio (without the extension). This command runs all the necessary commands to generate the transcription file.

For example:

make build/output/intervjuu201306211256.txt

Result (if everything goes fine, after about 5 minutes later (audio file was 8:35 in length, resulting in realtime factor of 0.6)). Also demos automatic punctuation (not yet publicly available):

# head -5 build/output/intervjuu201306211256.txt

Palgainfoagentuur koostöös CV-Online'i ja teiste partneritega viis kevadel läbi tööandjate ja töötajate palgauuringu. Meil on telefonil nüüd palgainfoagentuuri juht Kadri Seeder. Tervist.
Kui laiapõhjaline see uuring oli, ma saan aru, et ei ole kaasatud ainult Eesti tööandjad ja töötajad.
Jah, me seekord viisime uuringu läbi ka Lätis ja Leedus ja, ja see on täpselt samasuguse metoodikaga, nii et me saame võrrelda Läti ja Leedu andmeid, seda küll veel mitte täna sellepärast et Läti-Leedu tööandjatel ankeete lõpetavad.
Täna vaatasime töötajate tööotsijate uuringus väga põgusalt sisse, et need tulemused tulevad. Juuli käigus
aga kui rääkida tänasest esitlusest, siis tee pöörasid tähelepanu sellele, kui täpsemalt rääkisite sellest, millised on toimunud ja prognoositavad muutused põhipalkades ja nende põhjused, kas saaksite meile ka sellest rääkida.

Note that in the .txt file, all recognized sentences are title-cased and end with a '.'.

The system can also generate a result in other formats:

For example, to create a subtitle file, run

make build/output/intervjuu201306211256.sbv

Note that generating files in different formats doesn't add any runtime complexity, since all the different output files are generated from the same internal representation.

To remove the intermediate files generated during decoding, run the pseudo-target make .filename.clean, e.g.:

make .intervjuu201306211256.clean

Alternative usage

Alternatively, one can use the wrapper script speech2text.sh to transcribe audio files. The scripts is a wrapper to the Makefile-based system. The scripts can be called from any directory.

E.g., being in some data directory, you can execute:

/home/speech/kaldi-offline-transcriber/speech2text.sh --trs result/test.trs audio/test.ogg

This transcribes the file audio/test.ogg and puts the result in Transcriber XML format to result/test.trs. The script automatically deletes the intermediate files generated during decoding, unless the option --clean false is used.