bishoph / sopare

Real time sound pattern recognition in Python for Raspberry/Banana Pi.
Other
321 stars 86 forks source link

Questions regarding the use of Sopare #77

Open shandrios opened 5 years ago

shandrios commented 5 years ago

Hello again, still a big fan. I've been tinkering around a bit with your program, and I've got some questions.

  1. By default, the frequency range used in the speech analysis is 20-600, even though human speech goes much higher than that. Does increasing this range have an effect on the accuracy of the analysis? Also, it seems like the built-in FFT plot function's x axis caps out at 2000Hz by default, even if the frequency range is e.g 20-5000. Does the actual analysis still take the entire range into account, and can the axis limits be increased?

  2. I noticed the master branch hasn't had a commit since January 2018. Have there been any significant improvements in the testing branch that would warrant using that over the master branch for my own project?

  3. In my own project, I would like my Raspberry to recognise simple voice commands from a handful of different people. I can't necessarily get word samples from all of them, so I'm wondering what settings in the config file I should tinker with in order to improve the rate of success.

Thanks in advance, have a good day.

bishoph commented 5 years ago

1) Yes, more frequencies means normally more precision. But the default range gives decent results already even with normal hardware and works also for some range. The plot scales automatically and shows the time domain as well as the frequency domain.

2) Testing branch is ahead of master. Mostly smaller bugfixes. You get the full view right here: https://github.com/bishoph/sopare/compare/testing Switching branches is easy so you can give it a try.

3) Test. Adapt. Repeat. Can't really give better advice without details ;)

shandrios commented 5 years ago

My issue is that the plots don't seem to scale past 2000Hz. In the attached image, you can see that I've set the HIGH_FREQ and START_PROGRESSIVE_FACTOR to 5000, but the plots still only shows up to 2000/400. Is there something else I need to do to get the full range? graphs

bishoph commented 5 years ago

Check if your hardware limits the input. Other than that it could be that you are using a different configuration while you are using plot. Or it is a bug ;).

shandrios commented 5 years ago

Probably a bug then, because that is the only config file I have, and the Full FFT graph shows frequencies all the way up to 20000. fullfft

On another note, is it possible to use the volume of the analyzed sound in a plugin? I noticed that the plugin's run function takes three parameters, but only readable_results is used in your examples. Can the volume of the sound be gotten from the rawbuf or data parameters?

kimgenegaby commented 4 years ago

I noticed you concentrated lots of time in YouTube in the frequency domain. How to get frequency domain with a wav file?

dumblob commented 4 years ago

The question about frequency range would interest me as well. Any insights?

bishoph commented 4 years ago

What question about frequency range is unanswered?

dumblob commented 4 years ago

Question about how is the following possible:

Probably a bug then, because that is the only config file I have, and the Full FFT graph shows frequencies all the way up to 20000.

(see 3 comments above in https://github.com/bishoph/sopare/issues/77#issuecomment-510472815 )

bishoph commented 4 years ago

Only the full FFT graph shows all frequencies. Single token graphs are, as the word states, tokenized and inherit only parts of the frequencies. Like a single piece of cake don't contain all the ingredients of the full cake...

dumblob commented 4 years ago

Single token graphs are, as the word states, tokenized and inherit only parts of the frequencies.

That didn't appear to me. Thanks for clarification and a great project overall!