alumae / kaldi-gstreamer-server

Real-time full-duplex speech recognition server, based on the Kaldi toolkit and the GStreamer framwork.
BSD 2-Clause "Simplified" License
1.07k stars 341 forks source link

Invalid JSON utterance results #142

Open dweberlj opened 6 years ago

dweberlj commented 6 years ago

Hello,

I have the server installed and running one of your samples using your HTTP API interface (see below). The output from the server looks like invalid JSON data. I must have something configured incorrectly.

Any thoughts would be greatly appreciated.

Thank you

$curl -T test/data/english_test.wav "http://localhost:8888/client/dynamic/recognize"

$python kaldigstserver/master_server.py --port=8888 DEBUG 2018-08-02 11:53:40,136 Starting up server INFO 2018-08-02 11:53:58,416 101 GET /worker/ws/speech (127.0.0.1) 0.58ms INFO 2018-08-02 11:53:58,416 New worker available <main.WorkerSocketHandler object at 0x7f4c4c54a110> INFO 2018-08-02 11:54:57,842 f07e5081-e669-470e-95f4-bcc22f3e5796: OPEN: user='none', content='none' INFO 2018-08-02 11:54:57,842 f07e5081-e669-470e-95f4-bcc22f3e5796: Using worker <main.HttpChunkedRecognizeHandler object at 0x7f4c4c54ac90> INFO 2018-08-02 11:54:57,845 f07e5081-e669-470e-95f4-bcc22f3e5796: Handling the end of chunked recognize request INFO 2018-08-02 11:54:57,845 f07e5081-e669-470e-95f4-bcc22f3e5796: yielding... INFO 2018-08-02 11:54:57,845 f07e5081-e669-470e-95f4-bcc22f3e5796: Waiting for final result... INFO 2018-08-02 11:55:01,236 f07e5081-e669-470e-95f4-bcc22f3e5796: Receiving event {u'status': 0, u'segment': 0, u'result': {u'hypotheses': [{u'transcript': u'ONE.'}], u'final': Fa... from worker INFO 2018-08-02 11:55:01,542 f07e5081-e669-470e-95f4-bcc22f3e5796: Receiving event {u'status': 0, u'segment': 0, u'result': {u'hypotheses': [{u'transcript': u'ONE TWO.'}], u'final'... from worker

alumae commented 6 years ago

You are showing the output from the log, not the actual JSON response from the server. The log prints truncated JSON response by design.

dweberlj commented 6 years ago

Thanks for clarifying the log output.

When I run the following example (english_test.wav) it displays the following utterance result. This should be ONE TWO THREE FOUR FIVE SIX SEVEN EIGHT but returns O.

$ curl -T english_test.wav "http://localhost:8888/client/dynamic/recognize" {"status": 0, "hypotheses": [{"utterance": "O."}], "id": "bc728167-4d3d-4ac2-ba73-7f469e192aef"}

alumae commented 6 years ago

Can you post you configuration file?