Q: Fehl-Erkennung - Githubissues

Moin... leider wird im Forum nicht der ganze Text angezeit, deshalb hier nochmal.

endlich mal gute Nachrichten … mit diesem MIC Array (http://wiki.seeedstudio.com/ReSpeaker_Mic_Array_v2.0/) und der 1-Channel Firmware geht es nun wie erwartet, da jegliche Filterung auf dem Mic durchgeführt wird und die reine Sprache in Sopare ankommt. Ein Problem bleibt… die (für mich) „hohe“ Fehlerkennung – wobei ich weniger Fehler als Jarvis mit Snowboy habe, um ehrlich zu sein, ist Jarvis extrem langsam und extrem fehleranfällig, nur der Einrichtungsassistent ist toll ;). Ich habe 3 Kommandos, Licht, Lampe und Steckdose. Es kommt oft vor, dass Licht die Steckdose schaltet. Oder ein Husten von mir schaltet die Lampe. Nach tagelangem Testen und Videos wiederholt angucken sieht meine Config nun so aus:

#########################################################
# Stream prep and silence configuration options #########
#########################################################

[stream]

# Read chunk size
CHUNK = 512

# Sample rate
SAMPLE_RATE = 16000

# Volume threshold when audio processing starts / silence
THRESHOLD = 500

# Silence time in seconds when analysis is called
MAX_SILENCE_AFTER_START = 1

# Time in seconds after the analysis is forced
MAX_TIME = 2.4

# Start the analysis after reaching LONG_SILENCE
LONG_SILENCE = 20

# Characteristic length
CHUNKS = 2048

#########################################################
# Characteristic configuration options ##################
#########################################################

[characteristic]

# Steps boil down the data into smaller chunks of data.
# Smaller steps mean more precision but require
# normally more learned entries in the dictionary.
# Progressive value is used if you want to pack not
# so relevant frequencies
PROGRESSIVE_FACTOR = 0
START_PROGRESSIVE_FACTOR = 600
MIN_PROGRESSIVE_STEP = 25
MAX_PROGRESSIVE_STEP = 25

# Specifies freq ranges that are kept for further
# analysis. Freq outside of the ranges are set to zero.
# Human language can be found between 20 and 5000.
LOW_FREQ = 40
HIGH_FREQ = 1000

# Make use of Hann window function
HANNING = true

# Range factor for peaks
PEAK_FACTOR = 0.7

#########################################################
# Compare configuration options #########################
#########################################################

[compare]

# Min. number of tokens to identify the beginning of a word
MIN_START_TOKENS = 2

# Min. value for potential beginning of a word
MARGINAL_VALUE = 0.8

# Minimal similarity across all comparison to
# identify a complete word across all tokens
MIN_CROSS_SIMILARITY = 0.97

# Calculation basis or token/word comparison
SIMILARITY_NORM = 0.7
SIMILARITY_HEIGHT = 0
SIMILARITY_DOMINANT_FREQUENCY = 0
SIMILARITY_ZERO_CROSSING_RATE = 0.3

# Number of best matches to consider.
# Value must be > 0
# If not specified or value < 1 value is set to 1
NUMBER_OF_BEST_MATCHES = 15

# Min. distance to keep a word
MIN_LEFT_DISTANCE = 0.3
MIN_RIGHT_DISTANCE = 0.3

# Use given number as results to assembly result
# 0 for all predictions
MAX_WORD_START_RESULTS = 2
MAX_TOP_RESULTS = 5

# Enable or disable strict length check for words
STRICT_LENGTH_CHECK = true
# Value to soften the strict length check a bit to still
# get quite precise results but to be less strict
STRICT_LENGTH_UNDERMINING = 2

# Short term memory retention time in seconds. Zero to disable STM
# 1.2
STM_RETENTION = 1.2

# Fill result percentage
# 0.5 means that half of the values can by empty to still get valid results
# A lower value should theoretically avoid false positives
FILL_RESULT_PERCENTAGE = 0.1

#########################################################
# Misc configuration options ############################
#########################################################

[misc]

# Loglevel (CRITICAL, ERROR, WARNING, INFO, DEBUG)
LOGLEVEL = ERROR

#########################################################
# Experimental configuration options ####################
#########################################################

[experimental]

# Additional FFT analysis and comparison for CHUNKS/2 length
FFT_SHIFT = true

Schalten tut er zu >95% beim ersten mal, und 100% beim Wiederholen des Kommandos, nur leider nicht immer richtig.

Wo kann ich noch dran schrauben? Gibt es eine Möglichkeit den Wortanfang noch genauer zu vergleichen? Wegen dem Problem Gicht, Sicht, Nicht -> schaltet auch das Licht.

Danke

Also: Als erstes fällt die niedrige Sample Rate auf, ich nutzte ein Mic welches mit 48000 Hz arbeitet und damit natürlich auch direkt 3x genauer ist.

Bei den folgenden Werten besteht aus meiner Sicht das größte Optimierungspotential:

MIN_PROGRESSIVE_STEP sowie MAX_PROGRESSIVE_STEP

Je kleiner die Werte, desto weniger Komprimierung und desto genauer. Ich würde mal mit 10, 5 und 2 testen. Nach dem Umstellen muss neu trainiert werden (alte Daten löschen!). Zu kleine Werte gehen allerdings zu Lasten der Erkennung. Also muss sich langsam an die optimalen True/False Positive Werte herantasten. Hängt stark von der Umgebung und der eingesetzten Hardware ab.

Nachfolgend meine Config, welche 24/7 recht erfolgreich und genau arbeitet und Licht steuert:


#########################################################
# Stream prep and silence configuration options #########
#########################################################

[stream]

# Read chunk size
CHUNK = 512

# Sample rate
SAMPLE_RATE = 48000

# Volume threshold when audio processing starts / silence
THRESHOLD = 380

# Silence time in seconds when analysis is called
MAX_SILENCE_AFTER_START = 1.4

# Time in seconds after the analysis is forced
MAX_TIME = 2.4

# Start the analysis after reaching LONG_SILENCE
LONG_SILENCE = 20

# Characteristic length
CHUNKS = 3072

#########################################################
# Characteristic configuration options ##################
#########################################################

[characteristic]

# Steps boil down the data into smaller chunks of data.
# Smaller steps mean more precision but require
# normally more learned entries in the dictionary.
# Progressive value is used if you want to pack not
# so relevant frequencies
PROGRESSIVE_FACTOR = 0
START_PROGRESSIVE_FACTOR = 600
MIN_PROGRESSIVE_STEP = 5
MAX_PROGRESSIVE_STEP = 5

# Specifies freq ranges that are kept for further
# analysis. Freq outside of the ranges are set to zero.
# Human language can be found between 20 and 5000.
LOW_FREQ = 20
HIGH_FREQ = 600

# Make use of Hann window function
HANNING = true

# Range factor for peaks
PEAK_FACTOR = 0.7

#########################################################
# Compare configuration options #########################
#########################################################

[compare]

# Min. number of tokens to identify the beginning of a word
MIN_START_TOKENS = 3

# Min. value for potential beginning of a word
MARGINAL_VALUE = 0.8

# Minimal similarity across all comparison to
# identify a complete word across all tokens
MIN_CROSS_SIMILARITY = 0.85

# Calculation basis or token/word comparison
# 0.6, 0.2, 0, 0.2
SIMILARITY_NORM = 0.5
SIMILARITY_HEIGHT = 0.2
SIMILARITY_DOMINANT_FREQUENCY = 0
SIMILARITY_ZERO_CROSSING_RATE = 0.3

# Number of best matches to consider.
# Value must be > 0
# If not specified or value < 1 value is set to 1
NUMBER_OF_BEST_MATCHES = 2

# Min. distance to keep a word
MIN_LEFT_DISTANCE = 0.9
MIN_RIGHT_DISTANCE = 0.7

# Use given number as results to assembly result
# 0 for all predictions
MAX_WORD_START_RESULTS = 2
MAX_TOP_RESULTS = 3

# Enable or disable strict length check for words
STRICT_LENGTH_CHECK = true
# Value to soften the strict length check a bit to still
# get quite precise results but to be less strict
STRICT_LENGTH_UNDERMINING = 1

# Short term memory retention time in seconds. Zero to disable STM
STM_RETENTION = 1.2

# Fill result percentage
# 0.5 means that half of the values can by empty to still get valid results
# A lower value should theoretically avoid false positives
FILL_RESULT_PERCENTAGE = 0

bishoph / sopare

Q: Fehl-Erkennung #30