cmusphinx / sphinx4

Pure Java speech recognition library
cmusphinx.sourceforge.net
Other
1.41k stars 586 forks source link

[FEATURE REQUEST] Macro-like support #45

Closed mainrs closed 8 years ago

mainrs commented 8 years ago

I don't know if the title is a good one but I will try to explain what my goals are. English is not my native language so please forgive any mistakes I make :)

The basic idea is to listen to the user input by using your LiveSpeechRecognizer. The user says commands and the program should then execute them accordingly. A small example for the game I am working on:

User Input Game Interpretation
play card 1 card 1 gets played
play card 2 card 2 gets played
select card 2 card 2 gets selected
card 2 card 2 gets selected

As you can see, the third and forth command are equivalent to each other but the command isn't the actually the same. The usage of select feels more natural than just the plain card 2 command. Implementing this kind of behavior is not that hard. One just needs to create a simple grammar files and define select as optional:

<select_card> = [select] card <digit>

Now sphinx4 will recognize now both commands, card 2 and select card 2. No we come to the part were the feature request takes place :)

It is kinda hard to compare those two return values and decide if they are the same commands (which basically means they both match the grammar described above). The program needs to calculate the differences and then decide if they match or not. But the fact that sphinx4 already loads up the grammar and parses the input means that it knows both sentences are equally treated. My request is that you might implement some kind of callback function where people can register to sphinx4 via an simple interface so they get notified if the user input matches some specific grammar conditions. Here is a small (pseudo-like) example of what I mean:

public interface GrammarCondition {

    /**
     * Gets called when sphinx4 recognizes the user input and can match it to some specific grammar part.
     * @param grammarPartName the name of the grammar part that applies to the input (in this case, "select_card")
     * @param userInput the String that could be parsed from the sound captured.
    void conditionFound(String grammarPartName, String userInput);
}

And maybe some registration service:

Conditions.register(String grammarPartName, GrammarCondition condition);

I would really really like to get feedback of you guys! You did a great job on this project. Keep your work up :) Thank you for reading all that stuff, Sven

nshmyrev commented 8 years ago

We have recently discussed it here:

https://github.com/cmusphinx/pocketsphinx/issues/13

mbait commented 8 years ago

Hm, but how do you distinguish those two examples yourself? Suppose a friend of yours says you "card 2" - did he really mean "select card 2" or just "card 2"?

On Thu, Dec 10, 2015 at 8:06 AM, Nickolay V. Shmyrev < notifications@github.com> wrote:

We have recently discussed it here:

cmusphinx/pocketsphinx#13 https://github.com/cmusphinx/pocketsphinx/issues/13

— Reply to this email directly or view it on GitHub https://github.com/cmusphinx/sphinx4/issues/45#issuecomment-163671562.

Sincerely, Alexander

mainrs commented 8 years ago

Well, the grammar defines them as exactly the same thing. So if you say card 2 or select card 2, the result should be the same. Even if they are not, letting the implementation of the interface decide if that is still in the same scope would be enough. An application that receives the string card 2 should decide by itself if it is in the scope of its usage. But the application can't directly decide that both statements MAY mean the same. Sure that highly depends on usage cases and actual implementation. But for the example above, the statements should be treated equally.