bishoph / sopare

Real time sound pattern recognition in Python for Raspberry/Banana Pi.
Other
321 stars 86 forks source link

Questions - FAQ / Wiki #56

Open a3020 opened 5 years ago

a3020 commented 5 years ago

@bishoph first of all, thanks for all the time you've dedicated to this project. The project is really impressive.

I have some concrete questions that you or someone else may be able to answer. Maybe there are suitable to put in a Wiki, but I'm unsure how you fancy that idea.

Questions:

  1. Is it better to use single words or multiple words for accuracy?
  2. Does one get better accuracy if words are completely different (e.g. 'play' + 'stop' instead of 'play' + 'pause')?
  3. How can raw recordings be analyzed?
  4. Which settings are important when using a prefixed word to trigger analysis? ('Alexa ...')
  5. A single word is triggered (false positive) e.g. when knocking on a table, what should one do or look into to prevent this?
  6. Which settings are important to reduce false positives?
  7. How should the 'sorted_best_match' values in debug mode be interpreted?
  8. What is a token?
bishoph commented 5 years ago

Thanks. I try to give some short answers:

1) Depends. I personally train each word, but because one normally speaks single words a bit different than a sentence or two words one after another you may want to test this out by yourself

2) Not mandatory. My robotic arm control has right and light and my smart home light control "an" and "aus". SOPARE can differentiate such words quite well

3) Don't understand the question, sorry

4) SOPARE has no built in support for prefix words. Just train normal and write a custom plugin where you do something special with some words/sounds

5) Better training, different (better) microphone, different setting

6) Depends. Start with this blog post: https://www.bishoph.org/sopare-precision-and-accuracy/

7) Depends on the debug purpose ;)

8) A token is a single entity from a sequence that (in best case) can be encountered again under similar circumstances. Here is a blog post that hopefully sheds some light into this: https://www.bishoph.org/smart-home-and-voice-control-sopare-beta-testing/

a3020 commented 5 years ago

I think I found the answer to question 7 in one of your blog posts:

sorted_best_match: [[MIN_CROSS_SIMILARITY, MIN_LEFT_DISTANCE, MIN_RIGHT_DISTANCE, START_POS, LENGTH, u'PREDICTION'] (...)]

a3020 commented 5 years ago

Regarding question 3:

How can raw recordings be analyzed?

I'm trying visualize training data. Is the 'plot' option meant for that? I'm unsure because of #58.

bishoph commented 5 years ago

Yes, this is one option to visualize the data. You need to add the config option

SIMILARITY_ZERO_CROSSING_RATE

Details:

https://github.com/bishoph/sopare/pull/55