Closed drokia2 closed 10 years ago
By speech segmentation I mean that a long audio file is segmented into shorter utterance-like segments (up to 20 seconds) before decoding.
It doesn't do what you are asking (creating a phoneme-aligned recognition output), but it is not difficult to do using the Kaldi executable lattice-align-words.
In the readme it says that the kaldi-offline transcriber does speech segementation. Does it do this of phones? and if so how can I get it to spit out phones if I input some mp3 or other sound file? Thanks I dug around for a bit but it seemed that it only outputted the speech's words