SEI-TAS / pycloud

Server software to manage virtualized services on a KVM-based discoverable cloudlet (Cloudlet Server component of the KD-Cloudlet.project)
Other
20 stars 11 forks source link

About interaction issues between client and server #8

Closed haiyunaixueye closed 6 years ago

haiyunaixueye commented 6 years ago

SpeechClient as a test case, I want to confirm that whether the data returned by the server to the SpeechClient is as the picture shown or not.

image

sebastian-echeverria commented 6 years ago

The communication between the Speech Client and the Speech Server inside the Service VM seems to be ok. However, it does not seem to be finding any words in the wav file. This may be because the model is not properly calibrated for whatever speech is in the wav files you are using.

If you go to the speech-android repo, there is a "samples" folder with a wav file, audio.wav. The model currently configured in the speech-server repo is calibrated for the speech in that file. So, if you use that file, you should see a string of words in the "Speech output" section after sending it to the server.

haiyunaixueye commented 6 years ago

Thanks for your suggestion, I use the sample provided by the speech-android repo and the results are as follows:

image

I have some questions:

  1. Is the sampling rate of the wav format file must be 44100, the number of sample bits is 16 bits, the bit rate is 706kbps/CBR?

  2. Does the speech recognize other languages, such as Chinese?

  3. I found that the recognition of several words in the sample was wrong. How high is this speech library recognition rate?

Looking forward to your reply.

sebastian-echeverria commented 6 years ago

Yes, those results are what I would expect.

The speech server is based on the Sphinx library (https://cmusphinx.github.io/). It is a highly configurable speech recognition library, so even though maybe the current model that we have in the speech-server repo may be limited to the parameters you mention (I am not sure though), the library itself is clearly not. The speech-server itself is not much more than a TCP server that calls the library with the sample it receives and returns the results through TCP. All the heavy lifting is done by Sphinx.

Also, the audio sample that we use is a random one we found that worked with the current model we have, but it is probably not the best sample for that model (thus the errors you saw). Sphinx is able to handle different languages, but again, with the correct model. I suggest you go to the link above to read up abou Sphinx. I think you should probably be able to contact the authors if needed as well, but they seem to have enough documentation to get you started, in case you want to play around with it.

haiyunaixueye commented 6 years ago

Thanks for your answer. I will try to read up about Sphinx.