Ben kaldi conf - Githubissues

benfoley commented 2 years ago

Updates to Kaldi to output CTM files with confidence values.

I renamed the gmm-decode template dir to gmm-decode-online as these scripts are actually doing online decoding.

While trying the scripts I noticed that the online decoding process spends about 30 seconds building stats, which the plain gmm-decode method didn't do. So I've added a condition to use the earlier gmm-decode (non-online) method for transcribing audio under 10 seconds length, and use the online method for longer audio.

Also, the CTM scripts build the Elan output file directly from the CTM data, and then create the Textgrid. This is the opposite from how it worked before. The new approach allows the Elan file to have confidence values as a separate child tier of the parent transcription annotation values.

benfoley commented 2 years ago

Sample output:

Here's sample ctm file (renamed to txt for uploading here): ctm_with_conf.txt

This is a screenshot of a generated Elan file showing the transcription and confidence values.

benfoley commented 2 years ago

minor changes:

update praatio (and nltk to get rid of a regex Pattern not found error)
rename gmm-decode template dir to gmm-decode-online to make it better reflect the approach that is used

major changes:

new file elpis/engines/common/output/ctm_to_elan.py which is based almost entierly on Nick's ctm_to_textgrid.py file
new file elpis/engines/kaldi/inference/gmm-decode-conf/gmm-decode-conf.sh which is the original gmm-decode recipe, updated to output ctm with confidence values. I haven't bothered to slice it up into stages.. maybe one day.
new template scripts in gmm-decode-online-conf where I have updated the current online decoiding templates to output ctm with confidence values.
all the transcription stuff is single input audio file.. there are remnants and attempts of handling multiple files scattered throughout but nothing consistent. I've made some of the wav.scp generation code more explicitly single-file (eg echo "decode audio.wav" > ./data/infer/split1/1/wav.scp). this might make it easier in the future to identify where things need to change to accomodate multiple input files.

CoEDL / elpis

Ben kaldi conf #239