Open andrenatal opened 8 years ago
I've created an initial version of this API in our fork
/**
* A tagging interface for keyword spotting events.
*
* @author Kelly Davis - Initial contribution and API
*/
public interface KSEvent {
}
/**
* The listener interface for receiving {@link KSEvent} events.
*
* A class interested in processing {@link KSEvent} events implements this interface,
* and its instances are passed to the {@code KSService}'s {@code spot()} method.
* Such instances are then targeted for various {@link KSEvent} events corresponding
* to the keyword spotting process.
*
* @author Kelly Davis - Initial contribution and API
*/
public interface KSListener {
/**
* Invoked wwhen a {@link KSEvent} event occurs during keyword spotting.
*
* @param ksEvent The {@link KSEvent} fired by the {@link KSService}
*/
public void ksEventReceived(KSEvent ksEvent);
}
/**
* This is the interface that a keyword spotting service has to implement.
*
* @author Kelly Davis - Initial contribution and API
*/
public interface KSService {
/**
* Obtain the Locales available from this KSService
*
* @return The Locales available from this service
*/
public Set<Locale> getSupportedLocales();
/**
* Obtain the audio formats supported by this KSService
*
* @return The audio formats supported by this service
*/
public Set<AudioFormat> getSupportedFormats();
/**
* This method starts the process of keyword spotting
*
* The audio data of the passed {@link AudioSource} is passed to the keyword
* spotting engine. The keyword spotting attempts to spot {@code keyword} as
* being spoken in the passed {@code Locale}. Spotted keyword is indicated by
* fired {@link KSEvent} events targeting the passed {@link KSListener}.
*
* The passed {@link AudioSource} must be of a supported {@link AudioFormat}.
* In other words a {@link AudioFormat} compatable with one returned from
* the {@code getSupportedFormats()} method.
*
* The passed {@code Locale} must be supported. That is to say it must be
* a {@code Locale} returned from the {@code getSupportedLocales()} method.
*
* The passed {@code keyword} is the keyword which should be spotted.
*
* @param ksListener Non-null {@link KSListener} that {@link KSEvent} events target
* @param audioSource The {@link AudioSource} from which keywords are spotted
* @param locale The {@code Locale} in which the target keywords are spoken
* @param keyword The keyword which to spot
* @return A {@link KSServiceHandle} used to abort keyword spotting
* @throws A {@link KSException} if any paramater is invalid or a problem occurs
*/
public KSServiceHandle spot(KSListener ksListener, AudioSource audioSource, Locale locale, String keyword) throws KSException;
}
/**
* An handle to a {@link KSService}
*
* @author Kelly Davis - Initial contribution and API
*/
public interface KSServiceHandle {
/**
* Aborts keyword spotting in the associated {@link KSService}
*/
public void abort();
}
/**
* A {@link KSEvent} fired when the {@link KSService} encounters an error.
*
* @author Kelly Davis - Initial contribution and API
*/
public class KeywordSpottingErrorEvent implements KSEvent {
/**
* The message describing the error
*/
private final String message;
/**
* Constructs an instance with the passed {@code message}.
*
* @param message The message describing the error
*/
public KeywordSpottingErrorEvent(String message) {
this.message = message;
}
/**
* Gets the message describing this error
*
* @return The message describing this error
*/
public String getMessage() {
return this.message;
}
}
/**
* A {@link KSEvent} fired when the {@link KSService} spots a keyword.
*
* @author Kelly Davis - Initial contribution and API
*/
public class KeywordSpottingEvent implements KSEvent {
/**
* AudioSource from which the keyword was spotted
*/
private final AudioSource audioSource;
/**
* Constructs an instance with the passed {@code audioSource}
*
* @param audioSource The AudioSource of the spotted keyword
*/
public KeywordSpottingEvent(AudioSource audioSource) {
if (null == audioSource) {
throw new IllegalArgumentException("The passed audioSource is null");
}
this.audioSource = audioSource;
}
/**
* Returns the audioSource of the spotted keyword
*
* @return The audioSource of the spotted keyword
*/
public AudioSource getAudioSource() {
return this.audioSource;
}
}
After the conversations we had with Kai, I'd like to list here some ideas for the keyword spotting implementation we are working on.
So as a user I want to be able to trigger the
STTService
(https://github.com/eclipse/smarthome/issues/1021) just saying a keyword, rather than by an external device like a smartphone, or pushing a button on the appliance.So over the next days I'll be working in a interface between ESH and our custom keyword spotting module (being written in C), using JNI to accomplish this feature.
So here are some initial ideas:
org.eclipse.smarthome.io.multimedia.stt.STTWordspotting
should be introduced.AudioSource
that will be always open capturing audio from the microphone.STTService
that will be called to start to listen when the keyword is found in the audio fed by theAudioSource
. At this point, the implementation of theSTTWordspotting
should pause the stream of audio fromAudioSource
to the decoder.STTService
when it finishes the recognition handed over to it on item 4. Then theSTTWordspotting
implementation resumes the stream fromAudioSource
to our module, restarting the search for the keyword again.STTWordspotting
should be created and started on ESH boot, to allow the keyword spotting to be running since then.So the benefits of this approach are:
AudioSource
will feed bothSTTWordspotting
andSTTService
This is just an initial draft, please let me know if you have any thoughts.
Thanks,
Andre