dictation-toolbox / dragonfly

Speech recognition framework allowing powerful Python-based scripting and extension of Dragon NaturallySpeaking (DNS), Windows Speech Recognition (WSR), Kaldi and CMU Pocket Sphinx
GNU Lesser General Public License v3.0
383 stars 75 forks source link

Make SR engine text-to-speech functionality more flexible #345

Closed drmfinlay closed 2 years ago

drmfinlay commented 3 years ago

The code has been moved into Speaker classes as part of an effort to allow more flexible use of the (limited) text-to-speech functionality. The speak() methods of these engine classes have been adjusted accordingly.

@comodoro This is a start on the decoupling of Dragonfly's text-to-speech functionality. We spoke about this on the dragonfly gitter channel a little while back.

I still have to adjust the base engine code to utilise SAPI 5 on Windows, if it is available. I will probably use a design similar to get_engine(). It would be nice to have a Speaker class that works on Linux, but that could be implemented later.

I would also like to add an option for asynchronous playback, similar to what the Windows PlaySound() function flags allow, as well as a Speak action class.

drmfinlay commented 2 years ago

I have made some progress on this today. The Speaker classes I wrote have been moved back under dragonfly/engines. I have also added implementations for eSpeak and CMU Flite. These use the appropriate command-line programs and are readily available on Linux.

There is a new get_speaker() function which works in much the same way as get_engine(). The Kaldi, CMU Pocket Sphinx and text-input engine back-ends invoke that function (without the name argument) to retrieve a Speaker instance when engine.speak() is called.

On Windows, the default order is as follows:

  1. SAPI 5
  2. Natlink
  3. eSpeak
  4. CMU Flite
  5. text (stdout)

On other platforms, the first two are unavailable.

The Natlink and SAPI 5 engine back-ends always use their respective Speaker classes. It doesn't seem to make much sense to change this; get_speaker().speak() may easily be invoked instead of get_engine().speak().

I'll need to update the documentation before merging. I leave the Speak action class as something for an interested user to implement themselves -- it is simple enough.