cmusphinx / sphinx4

Pure Java speech recognition library
cmusphinx.sourceforge.net
Other
1.41k stars 586 forks source link

Keyword spotting feature available? #27

Closed jonathanglasmeyer closed 9 years ago

jonathanglasmeyer commented 9 years ago

Hi, i have been googling for the past hours to find out if Sphinx4 supports Keyword spotting right now. I found: 1) http://sourceforge.net/p/cmusphinx/code/HEAD/tree/branches/long-audio-aligner/KeyWordSpotting/ which is from 2011. 2) an implementation in pocketsphinx for android 3) an announcement concerning the S4 release schedule (http://cmusphinx.sourceforge.net/wiki/releaseschedule) that Keyword spotting is to come for Sphinx4.

Can you tell me if there is any information i am missing right now?

nshmyrev commented 9 years ago

keyword spotting is not available in sphinx4 yet. Implementation from long-audio-aligner is not a true spotting with pretty bad operation properties. We hope to implement it in coming release but it is not there yet.

jonathanglasmeyer commented 9 years ago

Ok, thanks for the information.

For my bachelor thesis i want to build a system that automatically recognizes speech from university lectures. The goal is improving recognition accuracy by using context information from lecture slides. The visible result should be a UI which presents lecture videos and lets you search for technical terms and jump to positions in the videos where they occur.

A working example can be seen at this lecture video from superlectures.com. The word error rate of technical terms is very high, however - this was the initial motivation for this project.

I am looking for the best way to do this with Sphinx4. Do you think extending the dictionary would be a feasible approach here, given keyword spotting isn't available yet?

nshmyrev commented 9 years ago

Building language models for domain specific transcription is described here:

http://cmusphinx.sourceforge.net/wiki/tutoriallmadvanced

There is a good dissertation done by Stephen Marquard dedicated to lecture transcription with sphinx4, you might be interested to read it:

http://pubs.cs.uct.ac.za/archive/00000846/01/MPhil-Dissertation-StephenMarquard.pdf

http://trulymadlywordly.blogspot.de/2011/12/sphinx4-speech-recognition-results-for.html

jotadepicas commented 5 years ago

keyword spotting is not available in sphinx4 yet. Implementation from long-audio-aligner is not a true spotting with pretty bad operation properties. We hope to implement it in coming release but it is not there yet.

I know this is a very old issue, (and that probably this is deprecated in favor of kaldi?) but I stumbled upon it and after using pocketsphinx I wanted to try sphinx4 as well. Could you give some directions on how to implement keyword spotting mode to sphinx4 decoder? Or in other words, what are the challenges that you faced in implementing it, that caused not to be supported? Thanks.

nshmyrev commented 5 years ago

You'd better implement a keyword spotter for kaldi

timobaumann commented 5 years ago

there's some similar functionality in DialogOS which uses a phone-loop grammar to weed out garbage words. Take a look at https://github.com/dialogos-project/dialogos/blob/master/plugins/DialogOS_SphinxPlugin/src/main/java/edu/cmu/lti/dialogos/sphinx/client/SphinxLanguageSettings.java if you want to see roughly how it's implemented. I don't know, though, how well it works for actual keyword spotting (rather than the reverse of weeding out a bit of garbage).

nshmyrev commented 5 years ago

The issue is to write the appropriate searchmanager (wordprunning one is too complex and slow) and the linguist, for best performance it has to be low-level thing.