Closed benfoley closed 2 years ago
Sample output:
Here's sample ctm file (renamed to txt for uploading here): ctm_with_conf.txt
This is a screenshot of a generated Elan file showing the transcription and confidence values.
minor changes:
gmm-decode
template dir to gmm-decode-online
to make it better reflect the approach that is usedmajor changes:
new file elpis/engines/common/output/ctm_to_elan.py
which is based almost entierly on Nick's ctm_to_textgrid.py
file
new file elpis/engines/kaldi/inference/gmm-decode-conf/gmm-decode-conf.sh
which is the original gmm-decode recipe, updated to output ctm with confidence values. I haven't bothered to slice it up into stages.. maybe one day.
new template scripts in gmm-decode-online-conf
where I have updated the current online decoiding templates to output ctm with confidence values.
all the transcription stuff is single input audio file.. there are remnants and attempts of handling multiple files scattered throughout but nothing consistent. I've made some of the wav.scp generation code more explicitly single-file (eg echo "decode audio.wav" > ./data/infer/split1/1/wav.scp). this might make it easier in the future to identify where things need to change to accomodate multiple input files.
Updates to Kaldi to output CTM files with confidence values.
I renamed the
gmm-decode
template dir togmm-decode-online
as these scripts are actually doing online decoding.While trying the scripts I noticed that the online decoding process spends about 30 seconds building stats, which the plain gmm-decode method didn't do. So I've added a condition to use the earlier gmm-decode (non-online) method for transcribing audio under 10 seconds length, and use the online method for longer audio.
Also, the CTM scripts build the Elan output file directly from the CTM data, and then create the Textgrid. This is the opposite from how it worked before. The new approach allows the Elan file to have confidence values as a separate child tier of the parent transcription annotation values.