CoEDL / elpis

🙊 software for creating speech recognition models.
https://elpis.readthedocs.io/en/latest/
Apache License 2.0
152 stars 33 forks source link

Ben kaldi conf #239

Closed benfoley closed 2 years ago

benfoley commented 2 years ago

Updates to Kaldi to output CTM files with confidence values.

I renamed the gmm-decode template dir to gmm-decode-online as these scripts are actually doing online decoding.

While trying the scripts I noticed that the online decoding process spends about 30 seconds building stats, which the plain gmm-decode method didn't do. So I've added a condition to use the earlier gmm-decode (non-online) method for transcribing audio under 10 seconds length, and use the online method for longer audio.

Also, the CTM scripts build the Elan output file directly from the CTM data, and then create the Textgrid. This is the opposite from how it worked before. The new approach allows the Elan file to have confidence values as a separate child tier of the parent transcription annotation values.

benfoley commented 2 years ago

Sample output:

Here's sample ctm file (renamed to txt for uploading here): ctm_with_conf.txt

This is a screenshot of a generated Elan file showing the transcription and confidence values.

elan-conf
benfoley commented 2 years ago

minor changes:

major changes: