A human language interpreter binding interface

tilmankamp commented 8 years ago

In a workshop with Kai we discussed the need for some kind of Text-To-Action binding interface. The intent is to use it in conjunction with speech to text (new) and text to speech bindings.

Here is some first idea draft:

Adding a new interface org.eclipse.smarthome.io.commands.HumanLanguageInterpreter that allows execution of human language commands.
The interface provides a getter for retrieving the supported grammar in some EBNF form - see STTService proposal.
The interface provides a function that takes a human language command as string and returns a human language response as string. It will interpret the command and execute the resulting actions accordingly - e.g. sending commands to items.

A spoken command (e.g. "turn ceiling light on") could be captured by a STTService and passed into the HumanLanguageInterpreter that would send the according command to the item named "ceiling light". It then could return a human language string saying "ceiling light on", which will be passed into a TTSService binding to be finally sent to some loudspeaker.

hkuhn42 commented 8 years ago

As already stated in #584, i worked on something similar for some time and would like to contribute to this. I already did some research regarding different ways to implement voice command interfaces (so speech to text to action to text to speech) and made an extremely crude prototype (i was sidetracked by implementing an audio api among other things - [see ICommandInterpreter at my sylvani repo. Regarding a getter for grammar support i think it would be best if this could be optional because for certain services that do voice to intend instead of mere voice to text (like supported by amazon alexa, microsoft oxford or nuance mix) it could be quite difficult to offer a grammar. However adding these kind of services may offer good results

kaikreuzer commented 8 years ago

@hkuhn42 Note that this is about "text to intent", so no "voice" is directly involved in here - so I am actually not clear on whether a "voice-to-intent" service (do you have an example for such a service?) would at all fit in here.

@tilmankamp: Regarding the "grammar getter" (2): Is this something the service must provide or is this information that the runtime would have to provide to the service? Do you have an example how that looks? I am not sure if this is really the same "grammar" as on the STTService (where I rather thought that a "vocabulary" is provided)?

hkuhn42 commented 8 years ago

@kaikreuzer: there are services like nuance mix which do not convert language to text but interpret that text and deliver a json representation (which is also a kind of text) of the intent as result (please see Mix. I was planning on also supporting interpreters for that kind of service in my project. However you are right in that this does not exactly match the requirements.

@tilmankamp: In general i think that it might prove quite difficult to specify a vocabulary inside the api for certain kinds of implementations. To quote some other examples, i was experimenting with using openNLP and alternatively lucene to build indices or models of the items and channels inside openhab and than match the text input (may it be from a voice-to-text service, a chatbot or a simple textarea) to these. For neither approach would a vocabulary be easy to specify by the service.

kaikreuzer commented 8 years ago

deliver a json representation (which is also a kind of text)

No, this is already the "intent", so also the output of the text-to-intent service suggested here.

please see Mix.

Nice website with cool explanations! So what could be done is trying to have a similar JSON structure for "intents", then the "intent processor" (which we have not really defined yet anywhere, maybe we should?) could be shared among such services (i.e. the voice2intent and the text2intent ones). Is there any (de-facto) standard for such intent JSON structures?

tavalin commented 8 years ago

The two services I've used to generate intent data (they can handle voice or text to generate the intent) both have different, but similar formats. Having had a quick look at Mix, they are using what looks another different JSON format.

The way I could see this working is that ESH picks some sort of standard JSON structure (maybe an established structure or maybe it's own) for intents and we generate (or translate if we are using third party services like Mix, wit.ai or api.ai etc) this standard JSON and pass on to the "intent processor".

This should hopefully simplify the job of the "intent processor" if it works on a standard input.

hkuhn42 commented 8 years ago

To sum up my understanding: the human language interpreter would consist of two services: a text-to-intent interpreter which converts natural language to intents (and is locale dependant) and an intent-to-action interpreter which "converts" the intent to actions in ESH which is locale neutral. Also we define a JSON structure for the intents. Both services will also support returning the "result" of the action.

@kaikreuzer should we separate this into a second issue? @tavalin could you give samples of the two notations you worked with to date. It would really be interesting to have a look at them

tavalin commented 8 years ago

Here's some examples from my English language agents that I've experimented with.

wit.ai:

{
  "msg_id" : "df145aa9-de46-4208-86ac-6a0ac552fa80",
  "_text" : "turn on the light",
  "outcomes" : [ {
    "_text" : "turn on the light",
    "confidence" : 0.52,
    "intent" : "on_off",
    "entities" : {
      "state" : [ {
        "type" : "value",
        "value" : "on"
      } ],
      "device" : [ {
        "type" : "value",
        "value" : "light"
      } ]
    }
  } ]
}

api.ai:

  {
  "id": "771c5e53-3939-44c0-ac80-89d18ce48e98",
  "timestamp": "2016-02-13T22:19:40.149Z",
  "result": {
    "source": "agent",
    "resolvedQuery": "turn on the light",
    "action": "on_off",
    "actionIncomplete": true,
    "parameters": {
      "device": "light",
      "room": "",
      "state": "on"
    },
    "contexts": [
      {
        "name": "device_room_on_off_dialog_params_room",
        "parameters": {
          "state": "on",
          "device": "light",
          "room": ""
        },
        "lifespan": 1
      },
      {
        "name": "device_room_on_off_dialog_context",
        "parameters": {
          "state": "on",
          "device": "light",
          "room": ""
        },
        "lifespan": 2
      }
    ],
    "metadata": {
      "intentId": "e0e4c588-9bb3-430e-93c9-ed3634f905d7",
      "intentName": "device_room_on_off"
    },
    "fulfillment": {
      "speech": "Which room?"
    }
  },
  "status": {
    "code": 200,
    "errorType": "success"
  }
}

Things common to both:

contain the command/query text (regardless of whether the query input was an audio input or text input)
contain the name of the action/intent (i.e. action/command)
contain the parameters extracted from the input (i.e. devices, states)
allow for inputs to contain contextual information

As you can see apart from that there are quite some differences in the structure depending on capabilities and direction of the agent. For example, wit.ai returns a confidence rating so you know how confident the engine is that it has successful extracted the correct intent from the input. api.ai allows you to define text responses that can be displayed or send to a TTS engine.

kdavis-mozilla commented 8 years ago

@tilmankamp I wonder about requirement 3

"The interface provides a function that takes a human language command as string and returns a human language response as string. It will interpret the command and execute the resulting actions accordingly - e.g. sending commands to items."

Couldn't part of "executing the resulting actions" be sending a command to the TTS synthesizer to say "The temperature is 28C" for example. In other words why does the method have to return a string?

kdavis-mozilla commented 8 years ago

@hkuhn42 In looking at ICommandInterpreter

public interface ICommandInterpreter {
    /**
     * Handle a textual command (like turn the head light on) and respond with a textual response 
     * 
     * @param command the command to handle
     * @return a textual response
     */
    public String handleCommand(String command);

}

I have a few comments.

What is the input format exactly? You mention "implementations should handle either unstructured..or structured..commands". How is a client of this interface to know what is possible?
Does an implementation of this interface actually perform any actions besides providing a textural response? If not, it doesn't seem to fit the spec from tilmankamp

kdavis-mozilla commented 8 years ago

@tavalin @hkuhn42 I would really shy away from making any interlingua such as done by wit.ai or api.ai, as it's a real investment in time: interlingua design, interlingua parsing, and implies heavy weight implementations as each implementation must understand interlingua+natual language.

Such a interlingua implies that NLP tools to do text tokenization, sentence splitting, morphological analysis, suffix treatment, named entity detection... will all have to be in any implementation, making all implementations extremely heavyweight. Not to mention the fact that this would imply similar tools be available in any targeted language, which shuts out many smaller languages.

To this end I think keeping the "text-to-intent interpreter", which converts natural language to intents (and is locale dependant) and the "intent-to-action interpreter", which "converts" the intent to actions behind a single interface is good idea.

It allows one much flexibility in that one is able to make lightweight implementations, simply some RegEx parsing text, but also heavyweight implementations that include as many Stanford NLP tools as one likes.

hkuhn42 commented 8 years ago

@kdavis-mozilla : The interface and its early prototype implementation are meant to process textual input (turn the light on), execute the identified action and respond in a textual form (ok, there was an error, the light is already on ...) and was originally meant to work in conjunction with a voice interperter and syntheziser. To give an example, thats what my early prototype currently does: it captures the sentence turn the light on (or off) with a ui (html / javascript or command line), sends the audio data via the openhab ui plugin to microsoft oxford recognition webservice, routes the recognized text to the command interpreter which parses the text, looks for (hue) lights in ESH / Openhab, sends a turn on event if it finds at least one and responds with a text which is in turn send to microsoft oxford syntheziser. The audio output of the webservice is then played by the ui.

The comment regarding structured text originated in the fact that i also had a look at nuance mix. Overall i feel that a simple sentence in , sentence out interface would be much more easy to use and maintain in the beginning. Even if this would mean not beeing able to easily use advances services as mix.

tilmankamp commented 8 years ago

@kdavis-mozilla : Having a string return value for a human language response at requirement 3 was just for symmetry reasons. However: Somehow the desired sink has to be passed into the routine. So an additional argument would be required - like a target binding id. @kdavis-mozilla @hkuhn42 @tavalin : I like the idea of splitting it into text-to-intent and intent-to-action. But how do we deal with the/a response? Imagine the complexity in case we really want to do it right (which includes also i18n of the responses): audio-source -> STT -> text-to-intent (n languages) -> intent-to-action -> i18n-response (n languages) -> TTS -> audio-sink

kdavis-mozilla commented 8 years ago

@tilmankamp Good point, we need to specify the sink somehow.

kdavis-mozilla commented 8 years ago

@tilmankamp @hkuhn42 @tavalin The text-to-intent and intent-to-action split is overkill. It implies an increase in complexity that doesn't justify any current utility we can gain from it.

Speaking as one who spent years building just such a text-to-intent system using UIMA, it is not a small undertaking and involves layers and layers of NLP tools that in this case are simply not needed. Not to mention that the use of such NLP tools would imply we use only languages with well-supported NLP ecosystems, i.e. English, French, German, and maybe one or two more.

tavalin commented 8 years ago

@kdavis-mozilla are you saying that we should be doing text-to-action directly or that text-to-intent and then intent-to-action should be part of the same interface?

kdavis-mozilla commented 8 years ago

@tavalin I think that text-to-action should be done directly

tavalin commented 8 years ago

@kdavis-mozilla a couple of queries/concerns...

Will this mean we need to issue commands according to rigid grammar rather than natural language expressions? I guess what I'm getting at is, will users need to be conscious of the way they speak for commands to be understood and actioned?

Would this handle multiple commands in one sentence? e.g. "open the blinds and turn the light(s) off"

Can we easily cope with multi language support doing it this way?

kdavis-mozilla commented 8 years ago

@tavalin You can issue commands as complicated as you want. (This includes multiply commands in one sentence.) However, you will also have to have an implementation of the "text-to-action" interface that is sufficiently complicated to understand your text commands.

I don't think the conversation here has gotten detailed enough to specify if multiple languages are/are not supported by the "text-to-action" interface. However, I would hope that whatever "text-to-action" interface comes out of this discussion it supports multiple languages.

tilmankamp commented 8 years ago

@kdavis-mozilla @tavalin : Yes, we really should support multiple languages throughout all involved components. How about a global system configuration property? It could populate its supported values from the Add-On repository. Add-Ons that don't support the selected language will either default to English or fail/complain on the log.

hkuhn42 commented 8 years ago

I don't think the conversation here has gotten detailed enough to specify if multiple languages are/are not supported by the "text-to-action" interface

In think there is no need to discuss wether multiple languages are support, only how to implement it :)

In essence, Siri, Google Now, Cortana and Alexa created expectations that computers can understand natural language (i know that this is about text interpretation but chances are the text originates form SST or a chatbot like infrastructure).
If we want to aim for supporting these non technical users, we need support for commands in multiple natural languages (not necessarily in one sentence but at least somehow configurable). Multi command support would be nice but i think thats something people can accept if not possible.

@kdavis-mozilla

The text-to-intent and intent-to-action split is overkill You are right, lets go for TTA (text-to-action)

tavalin commented 8 years ago

So it sounds like the proposed end to end solution is as follows: Audio source -> STT -> Text-to-action (multiple languages) -> Text response (multiple languages) -> TTS -> Audio sink

As this issue focuses on the text-to-action service, have we any ideas for how to implement that?

kdavis-mozilla commented 8 years ago

@tavalin I think your summary is accurate.

As to implementation, I've some ideas. Here are some obvious first cuts...

A Map keying a limited set of canonical phrases against the possible actions
The above Map proceeded by a rule engine that rephrases non-canonical phrases to canonical ones
The above Map proceeded by a k-means clustering algorithm trained to rephrase non-canonical phrases to canonical ones
The above Map proceeded a RNN trained to rephrase non-canonical phrases to canonical ones
The above Map proceeded a BRNN trained to rephrase non-canonical phrases to canonical ones
...

There are many many possible ways to do this. The only limitation is imagination.

hkuhn42 commented 8 years ago

Another aproach i was thinking about was to usw a full-text or nlp Engine to Build an custom Dynamic Index / Model for the active esh setup (using the available Things and Channels. It is probably not scientific but my idea was to First try and Match the Target item (e.g. the omipresent light) and use this as a base to find out what the User wants by checking what is possible.

Talking Interface i would definitly add a ... public Locale getSupportedLocales() ... methode. If ok with all, i would update my existing Interface accordingly in the evening to use as a base for further discussion.

Am 16.02.2016 um 07:27 schrieb Kelly Davis notifications@github.com:

@tavalin I think your summary is accurate.

As to implementation, I've some ideas. Here are some obvious first cuts...

A Map keying a limited set of canonical phrases against the possible actions The above Map proceeded by a rule engine that rephrases non-canonical phrases to canonical ones The above Map proceeded by a k-means clustering algorithm trained to rephrase non-canonical phrases to canonical ones The above Map proceeded a RNN trained to rephrase non-canonical phrases to canonical ones The above Map proceeded a BRNN trained to rephrase non-canonical phrases to canonical ones ... There are many many possible ways to do this. The only limitation is imagination.

— Reply to this email directly or view it on GitHub.

kdavis-mozilla commented 8 years ago

@hkuhn42 Adding

public Locale getSupportedLocales()

sounds good to me.

tavalin commented 8 years ago

@hkuhn42 my first experiment for this part also used that approach. I used Solr to build an index of my items and tried to match the phrase. It was very simple and worked OK to a point but I found it reporting false positives when I asked "turn on bedroom fan" (which didn't exist) and it found a hit against a group called "bedroom".

tavalin commented 8 years ago

Another thought: if we are going down the route of a more natural speech/text conversation then we need to consider contextual information that may accompany the command in order to provide enough information to determine the correct action.

e.g. user: "Turn on the living room lights" response: "OK, turning the living room lights on" (living room lights turn on) user: "OK, turn them off" response: "OK, turning the living room lights off" (living room lights turn off)

Contextual information would probably also be necessary for machine learning component.

tilmankamp commented 8 years ago

I think adding public Locale[] getSupportedLocales() is the best way.

kdavis-mozilla commented 8 years ago

@tilmankamp Yeah :-) a typo

tilmankamp commented 8 years ago

@tavalin : Is there anything (implementation wise) speaking against keeping context as member variables of the intent-interpreter Add-On?

kdavis-mozilla commented 8 years ago

@tavalin Abstractly, this could all be handled by the text-to-action implementation holding state.

However, before anyone requests or attempts to add this, I would note that anaphora resolution, the technical term for what you are requesting, is an open unsolved problem in natural language processing[1].

tilmankamp commented 8 years ago

@kdavis-mozilla : Does this still hold true, if we limit context just and always to the last used/instructed things within the house?

kdavis-mozilla commented 8 years ago

@tilmankamp We can always create some partial solution. But whatever partial solution we create will only work part of the time.

Unfortunately, managing user's expectations with partial solutions is extremely hard to do, as the users then have to have a mental model of our solution and its limitations. Teaching them a complicated mental model will, I fear, turn them away from our solution.

For example, say the text-to-action implementation implementation has a member which is the last IoT device it manipulated. You can then say things such as

Turn the bathroom ceiling light off.

followed by sentences like

Turn it back on.

All's good.

But now assume we have two mics. One in the bedroom and one in the living room. From the bathroom your wife goes to bed and switches off the light in the bedroom

Turn the bedroom light off.

while you go to the living room and stay up a bit later to see the end of a movie you're watching. You're in the living room and the last light you used was in the living room. So you say

Turn it back on.

as there's only a single smarthome instance running and only a single text-to-action implementation on that smarthome instance, the smarthome thinks "it" refers to the light in the bedroom, not the light in the living room.

Your wife wouldn't be too happy.

kaikreuzer commented 8 years ago

Sorry for joining back in so late on the discussion, but I would like to challenge the "The text-to-intent and intent-to-action split is overkill" decision a bit: Only doing text-to-action has now actually turned into a text-to-textresponse service, which, as a side effect also does "some other stuff" like sending commands to items etc. So it is up to the service, what the intents actually are, i.e. what the user can do with it. So far we are only talking about switching lights on and off. But what about scheduling a sequence of actions for tomorrow morning? Or creating a rule through voice ("Please always turn on the kitchen light, when I enter the door")? I think on the long run, we might come up with many intents that are actually complex to "perform" - so asking the services to include all this logic (how do I create a rule with all its glory?) might mean a duplication of this code throughout all such services. "Externalizing" this logic into a single intent-to-action service, which is shared by all text-to-intent services would be imho cleaner. Nonetheless, I see that it is also complex to come up with a good definition for all such intents. So if in practice we will anyhow only have very few implementations for such a service, we can "bake it in" for simplicity reasons. Anyhow, I wonder what you think what kind of intents we should go for?

Is there anything (implementation wise) speaking against keeping context as member variables of the intent-interpreter Add-On?

I think we definitely need contextual information. Not only purely internal to the service, but even the possibility to feed it in through the interface. So @kdavis-mozilla could simply feed-in "living room" as a context, if microphone in the living room is used. Same will be important when using microphones in mobile phones - through BLE beacons, you can determine the location of the person, who issues the voice command, so "turn on the light" can be processed according to the supplied location information. This is probably not so important for the start, but we should not miss out on it. It might also have some relation to https://github.com/eclipse/smarthome/issues/582.

more natural speech/text conversation

Talking about "conversations" is another clear indication that we need a context. The system might actually be lacking some contextual information and has to ask back "in which room?" - so conversations probably even must be handled as individual sessions with their own contexts (not sharing a single context as in @kdavis-mozilla's example).

kdavis-mozilla commented 8 years ago

@kaikreuzer A few comments...

So it is up to the service, what the intents actually are, i.e. what the user can do with it.

This is the case regardless and is orthogonal to any considerations of the "text-to-intent and intent-to-action split". If a service ignores or doesn't understand an intent, the user can't access that functionality through text, be that intent turning a light on and off or creating rules through speech.

Yes, "externalizing" this logic into a single intent-to-action service, which is shared by all text-to-intent services would be cleaner. However, I think this is a case where the perfect is the enemy of the good.

I think using context, this mic is in the living room, is important. But, I think we should walk before we run. I want to try and get a functional service up and running, then add bells and whistles.

I guess I've seen far to many conversational interfaces try to do everything before they can do anything, then end in failure. So, I would rather have us do a small number of things well then everything badly.

I think a useful data point in this respect is HomeKit. Apple's HomeKit....

doesn't have anophora resolution
doesn't have mic location/context information
doesn't allow arbitrary variation on how you can say "Turn on the floor lamp"
doesn't have a dialog manager
doesn't do speaker recognition
...

Even with all Apple's resources they are trying to get the simple things right, before shooting for the moon.

kaikreuzer commented 8 years ago

What an imperfect Apple solution 8-) Yeah, it is not just voice recognition, but home automation in general that Apple has proven that it is a very complex matter by not coming up with a user-friendly solution for it as everybody had expected from them...

But I fully understand and support all your points above, so I fall quite again now :-)

hkuhn42 commented 8 years ago

Having Seen how much effort it took to get even a basic elend to end prototype up and running i do agree completely. We should not forget all the Good ideas but i also think optimization should be done After we got something up and running, not before.

Am 16.02.2016 um 17:01 schrieb Kelly Davis notifications@github.com:

@kaikreuzer A few comments...

So it is up to the service, what the intents actually are, i.e. what the user can do with it.

This is the case regardless and is orthogonal to any considerations of the "text-to-intent and intent-to-action split". If a service ignores or doesn't understand an intent, the user can't access that functionality through text, be that intent turning a light on and off or creating rules through speech.

Yes, "externalizing" this logic into a single intent-to-action service, which is shared by all text-to-intent services would be cleaner. However, I think this is a case where the perfect is the enemy of the good.

I think using context, this mic is in the living room, is important. But, I think we should walk before we run. I want to try and get a functional service up and running, then add bells and whistles.

I guess I've seen far to many conversational interfaces try to do everything before they can do anything, then end in failure. So, I would rather have us do a small number of things well then everything badly.

I think a useful data point in this respect is HomeKit. Apple's HomeKit....

doesn't have anophora resolution doesn't have mic location/context information doesn't allow arbitrary variation on how you can say "Turn on the floor lamp" doesn't have a dialog manager doesn't do speaker recognition ... Even with all Apple's resources they are trying to get the simple things right, before shooting for the moon.

— Reply to this email directly or view it on GitHub.

hkuhn42 commented 8 years ago

@kaikreuzer you are Right regarding Apple! But please do not be too quiet :)

Am 16.02.2016 um 17:17 schrieb Kai Kreuzer notifications@github.com:

What an imperfect Apple solution 8-) Yeah, it is not just voice recognition, but home automation in general that Apple has proven that it is a very complex matter by not coming up with a user-friendly solution for it as everybody had expected from them...

But I fully understand and support all your points above, so I fall quite again now :-)

— Reply to this email directly or view it on GitHub.

kaikreuzer commented 8 years ago

elend to end prototype

That's a good one (if you are German) :laughing:

hkuhn42 commented 8 years ago

elend to end prototype
That's a good one (if you are German) :laughing:

And funny enough a co production of me and also apple :laughing:

tilmankamp commented 8 years ago

Coming from what @tavalin said in regards to the steps that a voice command would take: Audio source -> STT -> Text-to-action (multiple languages) -> Text response (multiple languages) -> TTS -> Audio sink, I want to make some further simplifications:

How about having an inbound counterpart to the current global say command - like interpret? This would also be the name of the interpreting method of the Text-to-action interface. A STT Add-On will use it to pass text into the/a current Text-to-action Add-On.

Furthermore I would also join Text-to-action and Text-response into one Add-On. It is just more practical to put localized response texts next to their localized input parsers and grammars.

Finally I think that there is no return value or "into-some-other-machine-sinking" of response texts needed, if just the global say is used whenever something has to be said (sounds strange - yes).

So here is an updated version of the proposal:

Adding a new interface org.eclipse.smarthome.io.commands.TextToAction that allows execution of human language commands.
The interface provides a getter for retrieving the supported grammar in some EBNF form - see STTService proposal.
The interface provides a function that takes a human language command as string. It will interpret the command and execute the resulting actions accordingly - e.g. sending commands to items. If there should be a textual response to the user, it will use global say command for that.
It supports a getter for retrieving supported languages.

    interface TextToAction {
        Set<Locale> getSupportedLocales();
        String getGrammar();
        void interpret(String text);
    }

A spoken command (e.g. "turn ceiling light on") could be captured by an audio input binding that is connected to a STTService implementation. It will translate the given audio data to its textual representation and call interpret("turn ceiling light on");. A/The current TextToAction Add-On will match one of its supported phrases to the given text and execute the appropriate action - setting the state of CeilingLight to ON. Finally it will call say("ceiling light on");. This will cause the/a current TTSService to send the resulting audio data to a connected audio sink.

hkuhn42 commented 8 years ago

@tilmankamp Not having a text response would reduce the interface to voice output. Scenarios like a chatbot or a "smartclient" which does the text to voice and voice to text are than no longer possible. Also the whole service would not be usable without a TTSService.

Taking into accont the other discussion threads #1021 moving the response handling into a listener could make sense:

interface CommandInterpreter {
    public void interpret(String command, Locale locale);
    public Set<Locale> getSupportedLocales();
        void registerCommandInterpreterListener(CommandInterpreterListener interpreterListener);
    void removeCommandInterpreterListener(CommandInterpreterListener interpreterListener);
}

public interface CommandInterpreterListener {
     public void interpreted(CommandInterpreter commandInterpreter, String response); 
}

kdavis-mozilla commented 8 years ago

@hkuhn42 @tilmankamp I guess you should take a look at the comment from Kai and my follow-up for #1021. For the aysnc case this interface has changed a bit.

tilmankamp commented 8 years ago

@hkuhn42 : I totally agree with you on the given scenarios and your design. I just wanted to be as close to OpenHAB conventions as possible. If a subscriber model is the way to go, I will do it like this. One question: How is the wiring between subscriber and service supposed to be configured? Just by scripting?

kaikreuzer commented 8 years ago

One question: How is the wiring between subscriber and service supposed to be configured? Just by scripting?

I think this will depend on where and how it is used. You could allow specific wirings through configuration (or parameters when initiating it), but for other services we also have a "default" value which refers to the service that should be used, if nothing else is defined.

tilmankamp commented 8 years ago

@kaikreuzer @hkuhn42 @kdavis-mozilla @tavalin Ok - here is the interface I will implement now. It's the last version of @hkuhn42 - I like the name CommandInterpreter and will follow the subscriber model. I also added a structured result to the interpreted callback.

public enum CommandInterpreterResult {
    OK, INCOMPLETE_PHRASE, UNABLE_TO_EXECUTE, UNSUPPORTED_PHRASE
}

public interface CommandInterpreter {
    void interpret(String command, Locale locale);
    Set<Locale> getSupportedLocales();
    void registerCommandInterpreterListener(CommandInterpreterListener interpreterListener);
    void removeCommandInterpreterListener(CommandInterpreterListener interpreterListener);
}

public interface CommandInterpreterListener {
    void interpreted(CommandInterpreter commandInterpreter, CommandInterpreterResult result, String response); 
}

Thanks for all the input!

kdavis-mozilla commented 8 years ago

@tilmankamp This interface will not work. It doesn't specify a grammar.

tilmankamp commented 8 years ago

Ah - just forgot it - thanks for the hint! Here it is:

public enum CommandInterpreterResult {
    OK, INCOMPLETE_PHRASE, UNABLE_TO_EXECUTE, UNSUPPORTED_PHRASE
}

public interface CommandInterpreter {
    void interpret(String command, Locale locale);
    Set<Locale> getSupportedLocales();
    String getGrammar();
    void registerCommandInterpreterListener(CommandInterpreterListener interpreterListener);
    void removeCommandInterpreterListener(CommandInterpreterListener interpreterListener);
}

public interface CommandInterpreterListener {
    void interpreted(CommandInterpreter commandInterpreter, CommandInterpreterResult result, String response); 
}

kdavis-mozilla commented 8 years ago

@tilmankamp Sorry to be nit picky, but the grammar is Locale specific.

tilmankamp commented 8 years ago

Makes sense - I also put it into the callback - maybe someone needs it...

public enum CommandInterpreterResult {
    OK, INCOMPLETE_PHRASE, UNABLE_TO_EXECUTE, UNSUPPORTED_PHRASE
}

public interface CommandInterpreter {
    void interpret(String command, Locale locale);
    Set<Locale> getSupportedLocales();
    String getGrammar(Locale locale);
    void registerCommandInterpreterListener(CommandInterpreterListener interpreterListener);
    void removeCommandInterpreterListener(CommandInterpreterListener interpreterListener);
}

public interface CommandInterpreterListener {
    void interpreted(CommandInterpreter commandInterpreter, CommandInterpreterResult result, Locale locale, String response); 
}

kdavis-mozilla commented 8 years ago

@tilmankamp Did you consider the threading issues brought up by Kai?

eclipse-archived / smarthome

A human language interpreter binding interface #1028