chrisvfritz / isabella

A voice-computing assistant built in Ruby.
139 stars 9 forks source link

Use Sphinxtrain to train Isabella for my voice #2

Closed watsonbox closed 9 years ago

watsonbox commented 9 years ago

You say "Use Sphinxtrain to train Isabella for my voice, specifically. It looks like a pain in the butt, so I've been putting it off."

I've created a gem called sphinxtrain-ruby for making this process easier. Thought it might be useful - contributions and feedback welcome.

chrisvfritz commented 9 years ago

:tada: Fantastic! I guess I missed this in my earlier research. Thank you so much for all your work in making sphinx more accessible from Ruby! I'll be taking a look at this soon.

kluzny commented 9 years ago

Making sphinx understand you better is probably a better choice, but it's a pain to maintain, and it may not work for multiple users. Have you considered trying to find a way to create a best guess for what the user is saying?

require 'pocketsphinx-ruby'
require 'damerau-levenshtein'

$dl = DamerauLevenshtein
$dict = %w[ test computer open shell top kek ]

def best_match(raw_word)
  dict_map = $dict.map { |word| $dl.distance(word, raw_word) }
  scores = $dict.zip dict_map
  scores.sort { |a,b| a[1] <=> b[1] }[0][0]
end

Pocketsphinx::LiveSpeechRecognizer.new.recognize do |speech|
  result = speech.split.map{|word| best_match(word) }.join(' ') if speech
  puts "#{speech} most closely resembles:  #{result}"
end
chrisvfritz commented 9 years ago

I did indeed find that training for a specific individual had no impact on or worsened performance for other individuals.

Defining the dictionary and grammar should actually give you the best word matching that you're looking for. It restricts the scope of possible words and defines which context they can appear in.

The main problem with the library that I've experienced is actually _over_matching. When in a noisy location, Isabella will assume that ambient voices are meant for her.

chrisvfritz commented 9 years ago

@kluzny Just to be clear, I closed this because using sphinxtrain turned out not to offer any universal improvements. I'm still open to other strategies! If you're able to implement something that does seem to improve word matching, discussion can continue here or in another issue - and I'm also very open to pull requests. :smiley: