dictation-toolbox / dragonfly

Speech recognition framework allowing powerful Python-based scripting and extension of Dragon NaturallySpeaking (DNS), Windows Speech Recognition (WSR), Kaldi and CMU Pocket Sphinx
GNU Lesser General Public License v3.0
379 stars 74 forks source link

API changes #254

Open cjbassi opened 4 years ago

cjbassi commented 4 years ago

What's the policy on API changes? I've been having a bit of trouble implementing some features due to some issues with the dragonfly API. The two features I'm trying to implement are:

  1. Run a callback when the engine is done setting up and ready to listen in the do_recognition method. For this, I think it would be good to split this function up and add a prepare_recognition method that is supposed to be called before do_recognition.

  2. I'm trying to reinstantiate the engine with different settings and a different grammar. At least with the Kaldi engine, once you call disconnect, calling do_recognition doesn't work anymore. Also, get_engine will not create a new engine, it just returns the old one. Is it possible to update the engine settings once you've already created it?

Another issue I've noticed is having to call load() on both the grammar and engine can be quite confusing and problematic depending on the order you call them in. I think it would be much nicer if you didn't have to call load on either of them, and you instead passed the grammar directly to the engine in the prepare_recognition method.

Thanks!

drmfinlay commented 4 years ago

Hi @cjbassi,

Sorry for the late response.

Regarding feature one, adding a prepare_recognition() method/callback sounds like a good idea to me. As for feature two, I was unaware that Kaldi didn't work after calling disconnect() and, presumably, calling connect() again. Perhaps @daanzu could fix that? Maybe there should be another method for updating the engine configuration or for deallocating it so a new one is returned by the get_engine() function?

Regarding the load() methods, calling the engine's load method directly is not the intended use. Passing Grammar objects to the engine in your proposed prepare_recognition() method doesn't make much sense when you consider that grammars can be both loaded and unloaded after do_recognition() is called. Maybe you want to make sure that the grammar is being loaded into the right engine? In that case, you can set the Grammar.engine property or pass the engine constructor argument.

As for the policy on API changes generally, I try to keep things backwards compatible with older versions of dragonfly unless the change is to something internal that isn't used directly by users.

daanzu commented 4 years ago

I will take a look at the Kaldi backend disconnect().

cjbassi commented 4 years ago

Regarding feature one, adding a prepare_recognition() method/callback sounds like a good idea to me.

If we are going to do this in a backwards compatible way, we probably want to keep do_recognition and also add a separate set of APIs that splits its functionality into 2 functions. From here, we could probably change do_recognition so that it just calls these two new functions.

Question: Is disconnect() supposed to deallocate the engine?

For the grammar loading, it sounds like I might not be doing things correctly, so I'll have to do some more research and get back to you about that. Thanks for the pointers.

daanzu commented 4 years ago

@cjbassi How is the proposed prepare_recognition different from what Kaldi currently has? It is optional, but can be called prior to do_recognition to minimize the time before accepting recognition.

For disconnect, do you mean that calling it and then calling connect again doesn't work? I don't think do_recognition is supposed to work when not connected.

cjbassi commented 4 years ago

Oh I must have missed prepare_recognition. That looks like exactly what I need, thanks!

Yes, I'm calling disconnect, connect, and then do_recognition but it doesn't seem to be working and it just immediately returns.

daanzu commented 4 years ago

@cjbassi https://github.com/dictation-toolbox/dragonfly/blob/master/dragonfly/examples/kaldi_module_loader_plus.py should be a good example of how to do the basic stuff, including a message closely before start of recognition.

I will take a look at the disconnect issue.