New Keyword-Spotting interface to interoperate with STTService

After the conversations we had with Kai, I'd like to list here some ideas for the keyword spotting implementation we are working on.

So as a user I want to be able to trigger the STTService (https://github.com/eclipse/smarthome/issues/1021) just saying a keyword, rather than by an external device like a smartphone, or pushing a button on the appliance.

So over the next days I'll be working in a interface between ESH and our custom keyword spotting module (being written in C), using JNI to accomplish this feature.

So here are some initial ideas:

A new interface org.eclipse.smarthome.io.multimedia.stt.STTWordspotting should be introduced.
The interface should expose a method that receives an AudioSource that will be always open capturing audio from the microphone.
The interface should expose a method that receives the keyword that the decoder will be searching.
The interface should expose a method that receives the STTService that will be called to start to listen when the keyword is found in the audio fed by the AudioSource. At this point, the implementation of the STTWordspotting should pause the stream of audio from AudioSource to the decoder.
The interface should expose a callback that will be called by the STTService when it finishes the recognition handed over to it on item 4. Then the STTWordspotting implementation resumes the stream from AudioSource to our module, restarting the search for the keyword again.
The class that implements STTWordspotting should be created and started on ESH boot, to allow the keyword spotting to be running since then.

So the benefits of this approach are:

we don't need to worry about alsa/pulseaudio microphone sharing between two different process (in case of running the keyword spotting as a standalone service) once AudioSource will feed both STTWordspotting and STTService
the delegation of responsibilities between the recognition service and the keyword spotting service gets cleaner and easier to do, avoiding the need of using any sort of IPC method
we don't need to worry about handling the lifecycle of yet another process running besides ESH

This is just an initial draft, please let me know if you have any thoughts.

Thanks,

Andre

I've created an initial version of this API in our fork

/**
 * A tagging interface for keyword spotting events.
 *
 * @author Kelly Davis - Initial contribution and API
 */
public interface KSEvent {
}

/**
 * The listener interface for receiving {@link KSEvent} events.
 *
 * A class interested in processing {@link KSEvent} events implements this interface,
 * and its instances are passed to the {@code KSService}'s {@code spot()} method.
 * Such instances are then targeted for various {@link KSEvent} events corresponding
 * to the keyword spotting process.
 *
 * @author Kelly Davis - Initial contribution and API
 */
public interface KSListener {
   /**
    * Invoked wwhen a {@link KSEvent} event occurs during keyword spotting.
    *
    * @param ksEvent The {@link KSEvent} fired by the {@link KSService}
    */
    public void ksEventReceived(KSEvent ksEvent);
}

/**
 * This is the interface that a keyword spotting service has to implement.
 *
 * @author Kelly Davis - Initial contribution and API
 */
public interface KSService {
    /**
     * Obtain the Locales available from this KSService
     *
     * @return The Locales available from this service
     */
    public Set<Locale> getSupportedLocales();

    /**
     * Obtain the audio formats supported by this KSService
     *
     * @return The audio formats supported by this service
     */
    public Set<AudioFormat> getSupportedFormats();

   /**
    * This method starts the process of keyword spotting
    *
    * The audio data of the passed {@link AudioSource} is passed to the keyword
    * spotting engine. The keyword spotting attempts to spot {@code keyword} as
    * being spoken in the passed {@code Locale}. Spotted keyword is indicated by
    * fired {@link KSEvent} events targeting the passed {@link KSListener}.
    *
    * The passed {@link AudioSource} must be of a supported {@link AudioFormat}.
    * In other words a {@link AudioFormat} compatable with one returned from
    * the {@code getSupportedFormats()} method.
    *
    * The passed {@code Locale} must be supported. That is to say it must be
    * a {@code Locale} returned from the {@code getSupportedLocales()} method.
    *
    * The passed {@code keyword} is the keyword which should be spotted.
    *
    * @param ksListener Non-null {@link KSListener} that {@link KSEvent} events target
    * @param audioSource The {@link AudioSource} from which keywords are spotted
    * @param locale The {@code Locale} in which the target keywords are spoken
    * @param keyword The keyword which to spot
    * @return A {@link KSServiceHandle} used to abort keyword spotting
    * @throws A {@link KSException} if any paramater is invalid or a problem occurs
    */
    public KSServiceHandle spot(KSListener ksListener, AudioSource audioSource, Locale locale, String keyword) throws KSException;
}

/**
 * An handle to a {@link KSService}
 *
 * @author Kelly Davis - Initial contribution and API
 */
public interface KSServiceHandle {
   /**
    * Aborts keyword spotting in the associated {@link KSService}
    */
    public void abort();
}

/**
 * A {@link KSEvent} fired when the {@link KSService} encounters an error.
 *
 * @author Kelly Davis - Initial contribution and API
 */
public class KeywordSpottingErrorEvent implements KSEvent {
   /**
    * The message describing the error
    */
    private final String message;

   /**
    * Constructs an instance with the passed {@code message}.
    *
    * @param message The message describing the error
    */
    public KeywordSpottingErrorEvent(String message) {
        this.message = message;
    }

   /**
    * Gets the message describing this error
    *
    * @return The message describing this error
    */
    public String getMessage() {
        return this.message;
    }
}

/**
 * A {@link KSEvent} fired when the {@link KSService} spots a keyword.
 *
 * @author Kelly Davis - Initial contribution and API
 */
public class KeywordSpottingEvent  implements KSEvent {
   /**
    * AudioSource from which the keyword was spotted 
    */
    private final AudioSource audioSource;

   /**
    * Constructs an instance with the passed {@code audioSource}
    *
    * @param audioSource The AudioSource of the spotted keyword 
    */
    public KeywordSpottingEvent(AudioSource audioSource) {
        if (null == audioSource) {
            throw new IllegalArgumentException("The passed audioSource is null");
        }

        this.audioSource = audioSource;
    }

   /**
    * Returns the audioSource of the spotted keyword
    *
    * @return The audioSource of the spotted keyword
    */
    public AudioSource getAudioSource() {
        return this.audioSource;
    }
}

eclipse-archived / smarthome

New Keyword-Spotting interface to interoperate with STTService #1162