Closed mike-a-ellis closed 7 years ago
This is already supported. You have to set the property word-boundary-file
(as outcommented in https://github.com/alumae/kaldi-gstreamer-server/blob/master/sample_english_nnet2.yaml). Then the JSON encoding of the final result will include word start and end times, something like:
{
"status":0,
"segment-start":0.0,
"segment-length":6.12,
"total-length":6.12,
"result":{
"hypotheses":[
{
"transcript":"one two three four five six seven eight.",
"confidence":10000000000.0,
"likelihood":153.665,
"word-alignment":[
{
"start":1.18,
"length":0.43,
"word":"one",
"confidence":1.0
},
{
"start":1.65,
"length":0.29,
"word":"two",
"confidence":0.989745
},
{
"start":1.97,
"length":0.4,
"word":"three",
"confidence":1.0
},
{
"start":2.37,
"length":0.53,
"word":"four",
"confidence":1.0
},
{
"start":2.9,
"length":0.39,
"word":"five",
"confidence":1.0
},
{
"start":3.29,
"length":0.4,
"word":"six",
"confidence":1.0
},
{
"start":3.69,
"length":0.43,
"word":"seven",
"confidence":1.0
},
{
"start":4.28,
"length":0.24,
"word":"eight",
"confidence":0.991182
}
]
}
],
"final":true
},
"segment":0,
"id":"e887d790-b321-47ae-ae7b-13276b1b3fcd"
}
I am unable to get this to work. I am using /client/dynamic/recognize
Is there anything I need to do besides uncomment out the "word-boundary-file" ?
The extended results are only available through the the websocket-based interface.
Can you offer some advice on how to implement this. Any advice would be appreciated!
On Thu, Oct 5, 2017 at 2:32 AM, Tanel Alumäe notifications@github.com wrote:
The extended results are only available through the the websocket-based interface.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/alumae/kaldi-gstreamer-server/issues/95#issuecomment-334371572, or mute the thread https://github.com/notifications/unsubscribe-auth/AGPc5qKdKCeJqvSg7vFsmEWPXwgQ-Funks5spHf6gaJpZM4Pi-UP .
How can this be modified to get utterance time, word order, word start / stop time ?
Something similar in scope to what Dragon might give you :
<?xml version="1.0" encoding="windows-1252"?> <!DOCTYPE BODY SYSTEM "http://www.nuance.com/naturallyspeaking/dss/dtd/dss-idxv2.dtd"> http://www.nuance.com/naturallyspeaking/dss/dtd/dss-idxv2.dtd
I understand this has not been implemented and would be interested in doing so, but I need a bit of guidance...like what classes to research.
I have been able to get it to debug, so I feel like I have a shot. The challenge to Kaldi is the learning curve is a brick wall, so am not sure where to focus.