bishoph / sopare

Real time sound pattern recognition in Python for Raspberry/Banana Pi.
Other
321 stars 86 forks source link

About recognition accuracy and number of keywords can be recognized. #100

Open miziworld opened 2 years ago

miziworld commented 2 years ago

Hello, Thank you for developing and opening your project.

  1. First, I want to train and use 5 words. So I recorded the dataset 10 times per word x 5 people. That is, one word can have 50 voice dataset. Although it is simple speech processing, in general, the larger and more diverse the training data, the better the training results, so we expected good decoding results. However, it did not give as good results as expected. We also saw very poor accuracy when tested by people who were not involved in creating the training dataset. I'm adjusting the optimal value by changing the configuration options, but I want to know what is the best way to do it when creating a dataset.

  2. It seems to be correct that the result value is not output when the voice is not recognized, but if noise enters, '[]' is continuously output. Can I change this to not print?

  3. As a final question, I want only one word to be output at a time, since I'm aiming for word recognition, not sentences. For example, the recognition rate is a little low, so when I say 'door', it does not come out as ['door', 'open'] I want only one word to appear like ['door'] (o). How can I do that?

bishoph commented 2 years ago

Hi. As I have no clue about training data, environment, noise and other factors my only advice is to test, adapt and repeat. Keep in mind that this project does not uses AI or Deep Learning which means that bigger datasets don't always lead to better results.

In terms of the output, well, you can write your own plugin and output whatever you want.

miziworld commented 2 years ago

Thank you for your reply. In original plugins/print/init.py ->I think readable_results means the printed output. But I don't know where this is defined. Also, as mentioned above, can you help me on how to fix it to get rid of [] when noise is entered?

bishoph commented 2 years ago

As you said you don't want to get it printed out you can delete the existing print plugin and write your own custom output plugin and do whatever you want to do with the output. There is a blog post about plugins and there are several examples of working plugins as well so not sure what else you expect:

https://www.bishoph.org/sopare-architecture-and-plugins/ https://github.com/bishoph/Misc/blob/master/robotic_arm_control.py

If you want to have a certain functionality it's up to you to code it :)