Closed ethanaward closed 6 years ago
I think I have an idea for this but I'd like to get some input.
For starters let's take the model of what happens with humans. Short term & long term memory.
Humans in general can hold about 5~6 items in their short term memory at a time.
We'll probably need to build a context dict, this will hold a small piece of info about what was said:
{
"skill": "wolfram_alpha",
# insert joke here about how your wife remembers everything ...
"scope": "public/private" # allow the developer of the skill
# to mark this information public or private
# if it's a public any other skill can read it
# if it's private no other skills can read it.
"ttl": None, # time to live if it's None ... never forget
"data": { # this is a dict that the skill sets so the skill author
# controls what is remembered.
"utterance": "Mycroft, who is Lebron James?",
"response": {
"said": "..."
},
"pronoun": {
"he": "Lebron James"
}
},
"timestamp": time.time()
}
I'm not sure about the best structure for this one. The way python handles objects you can have the same object exist in all the dicts.
# skill_context would contain both the public & private entries
# later on we can build an interface that uses a database EG the long term
# memory.
skill_context = {
"wolfram_alpha": [{...}, {...}],
"random_skil_name": [{...}]
}
# This would be a list that you'd .append() items to.
public_context = [{...}, {...}]
Here's what it'd look like when it's all put together ... I kinda think in code.
from mycroft_context import MycroftContext
mycroft_context = MycroftContext()
class MycroftSkill(object):
def __init__(self, name, emitter=None):
...
self.context = mycroft_context.get_context(self.name)
class MySkill(MycroftSkill):
def handle_intent_example(self, message):
last_ten_items = self.context.recall()
utterance = message.metadata['utterance']
# TODO language_parser()
# language_parser would separate the sentence into
# nouns, pronouns, verbs, adverbs etc
# That's the hard part ... figuring out who "HE" is based
# on the response.
# That's going to require a ALOT of work because we'll have to have
# a whole dictionary.
# I recommend putting the language parser in an issue of it's own.
# or ... fake it and regex it and add a list of pronouns to choose
# from. The system wouldn't know gender, you could ask who is she
# and it'd still tell you.
self.context.store({
"utterance": utterance,
"sentence_components": language_parser(response)
})
mycroft_context.py
from mycroft.skill_context import MycroftSkillContext
class MycroftContext(object):
"""
Holds all the contexts public & private
"""
skill_context = defaultdict(list)
public_context = []
contexts = {}
def __init__(self):
pass
def get_context(self, mycroft_skill_name):
context = self.contexts.get(mycroft_skill_name)
if context:
return context
context = MycroftSkillContext(mycroft_skill_name,
self.skill_context[mycroft_skill_name],
self.public_context)
self.contexts[mycroft_skill_name] = context
return context
def forget(self):
# garbage collection
for skill_name, context in self.context.items():
context.forget()
return self
mycroft.skill_context.py
class MycroftSkillContext(object):
"""
MycroftSkillContext is used to isolate what the skill is allowed to "know"
"""
def __init__(self, mycroft_skill_name, skill_context, public_context):
self.name = mycroft_skill_name
self.skill_context = skill_context
self.public_context = private_context
def store(self, data, ttl=None, scope="private"):
# todo perhaps add "+1 day" for expires.
data['ttl'] = ttl
data['timestamp'] = time.time()
self.skill_context.append(data)
if scope != "private":
self.public_context.append(data)
return self
def recall(self, limit=10):
if limit == -1:
result = self.skill_context + self.public_context
else:
result = self.skill_context[-limit:] +\
self.public_context[-limit:]
# TODO sort by timestamp, perhaps make it an iterator.
return result
def forget(self):
# garbage collection
remove_items = []
now = time.time()
for item in context.skill_context:
ttl = item.get("ttl")
if ttl is None:
continue
timestamp = context.get("timestamp")
if now >= timestamp + ttl:
remove_items.append(item)
for item in remove_items:
try:
context.skill_context.remove(item)
except ValueError:
pass
try:
context.public_context.remove(item)
except ValueError:
pass
return self
Thanks, @the7erm and @ethanaward . I've also been giving this some thought. In my grand designs, contextual conversation (which I'll refer to as CC going forward) breaks itself into 3 components:
Hey mycroft, what's the weather like in Seattle tomorrow? <new session, no context>
How about the 5 day forecast? <implicit Location:Seattle from context>
The other side is to inform intent determination, which is a bit trickier. Extending the example above:
Hey mycroft, what's the weather like in seattle tomorrow? <no context>
What about the 5-day forecast? <implicit Location:Seattle from context>
What about in Chicago? <implicit WeatherQuery from context>
The final query above is significantly more ambiguous without any context. We could be looking for a movie, a plane ticket, or a song from the self-titled album, Chicago.
Based on the tight coupling of 3 with intent determination I see the ContextManager and integration components living in Adapt. Mycroft-core would then have an instance of the intent engine and the context manager, and glue them all together. Mycroft currently has a concept of session that I think will play nicely with the concept of a conversation (based on time since last active utterance).
My current plan for working on the Adapt components has me starting in July. I'd love to say that we can build a straw man's version of this and get by until then, but since we're talking about it being a part of skills (and the skills sdk), we'll be breaking a lot of third parties if we don't get it right the first time. If we implement something haphazardly first, that may be the version we're stuck with for a very long time.
If we implement something haphazardly first, that may be the version we're stuck with for a very long time.
I agree with you there.
Hey mycroft, what's the weather like in seattle tomorrow? <no context> What about the 5-day forecast? <implicit Location:Seattle from context> What about in Chicago? <implicit WeatherQuery from context>
It seems like issues #84 & #85 are very similar.
The way mycroft is set up - as near as I can tell with my limited experience - there is no state/session/context. It just does command -> response, and then you have to say "Hey mycroft" again.
So we're going to start leaving the session open for _____ seconds to create the CC dialog.
So perhaps the flow would go something like this:
Hey mycroft <voc/skill trigger>
Then on top of that you have to deal with the non-linear conservation:
Hey mycroft, what's the current temperature?
.....
What are my appointments for today?
.....
What's the 5 day forecast?
All technically inside the same session, but not the same skill tree.
I think we're going to have to have a list of all the skills that were recently accessed and iterate through those and have mycoft figure out if the voice input is for that particular skill.
So the flow would go something like:
skill_history
So, your statement of the lack of state is certainly a goal, but only from the perspective of a single skill. In reality, there's already 2 places where state resides: 1) The recognizer loop The recognizer loop is basically a state machine that is constantly listening for audio, and transitioning between Recorded/send audio Wakeword only, call-and-response mode Going to sleep Waking up
It's pretty reasonable to expect any client to maintain some state. 2) The IntentSkill The IntentSkill is kind of a special beast. If you check the skill loading code, you'll notice that IntentSkill is hardcoded to come up first; it has to already exist to accept vocab/intent registration messages from other skills. That state is managed with an instance of IntentDeterminationEngine, and I'd imagine an instance of ContextManager living there as well. This goes in a different direction from your suggestion of intent determination reaching out to each skill for it's current context, which is not necessarily practical in a world where skills may be running in lots of disparate processes (and only async communication available).
In general, I'm not against state in Mycroft, but we do need to manage it extremely carefully. Low latency (sub 10ms) is a hard requirement, and that limits us from leaving the device (assuming last mile is residential, as opposed to mycroft running in the cloud with a nearby context service).
Moving on to your comments about always saying hey mycroft
, I don't think that needs to be time based for a session. I would expect skills to give hints programmatically, by emitting a speak message and following up with an "expects_response" message, which could trigger audio recording without wakeword. I think this makes for a lower false positive rate when someone asks mycroft a question and then speaks to someone else in the room.
My intentions/goals may not have been clearly conveyed before when I mentioned not partitioning context by skill/tree. In doing that, you lose the ability to have a conversation that shares context and meanders across skills.
Hey mycroft, play taylor swift on pandora
<music plays>
Hey mycroft, when is she in town next?
<music softens, mycroft speaks over>TaySway will be in your area this Saturday night.
<music resumes>
hey mycroft, what's the weather going to be like?
<music softens, mycroft speaks over> It will be 67 degrees with a 9pm sunset this saturday.
<music resumes>
hey mycroft, buy the tickets.
<music softens, mycroft speaks over> How many tickets?
<music pauses>
Just one.
<mycroft makes computer beeping noises, because he can> Tickets purchased. You can pick them up at will call. Would you like me to add the event to your calendar?
Sure, but list it as "important business meeting"
<mycroft makes more beeping noises, because he still can> Done.
<music resumes>
When you look at something like the conversation above as an end goal, conversational context becomes much for fluid, and partitioning the context based on skill/tree starts to get in the way. A centralized context store provides quite a bit more flexibility here, without much more complexity (implementation or computational).
Upon reading your change, I don't understand what it does or how it's supposed to be used. Can you provide some documentation? Also, the tests you provided don't explain what they're doing or why.
Lastly, as I stated above, I believe that the context manager belongs in Adapt, as it will be tightly coupled with the IntentDeterminationEngine's usage of context.
Can you provide some documentation?
Sorry I've added some comments.
Lastly, as I stated above, I believe that the context manager belongs in Adapt, as it will be tightly coupled with the IntentDeterminationEngine's usage of context.
Ok I'll take a look at the adept source code.
Thanks for your time.
While other related issues have been opened on this and similar subjects, it appears that no one has actually commented on this critical issue in ten months. (See, for example issue #554 or #553)
Since Mycroft lacks state, only the very simplest skills can be built. While truly complex conversation parsing is doubtless a long ways off, it would appear that some simple enhancements to core or to adapt would allow skills to be (laboriously) created that require context.
Consider a simple RSS news reader. If the user starts with "Get current space news" and the skill reads the top five headlines, logically the user will want to say something like "Read more" or "Read story 2". With the current system, the best that a programmer can do is support an intent like "Read space news story two." or "Fetch more stories from space news." -- which I admit works, but is clumsy and not-at-all what the vision of the usability of the product requires.
Failing complex grammar aware language parsers, much of the capability could be provided by the simple expedient of allowing the init.py of a skill to register and de-register intents. This would allow skill builders to build conversational skills while the code takes a baby step forward towards the kind of complexity described above.
Something along the lines of #553 or #554 would allow a skills writer to handle the news-reader above, or a grocery list editor, or many other obvious skills that Mycroft should have, but which are difficult or impossible right now.
I urge the staff and the community to work towards adding ANY ONE of these first-steps into the distribution version, even if at a future date a better way to accomplish this stuff shows up. Two trite phrases apply. Perfect is the enemy of good enough, and the 80/20 rule is always true.
Support for context as a concept was added to Adapt in November. That change also incorporates a reference in-memory context manager. The next step would be to update mycroft-core to manage context. A PR on the Adapt repo suggests that @penrods may be working on this with folks at Jaguar.
I've been experimenting with adapt's context manager and the results are better than I'd imagined.
I've limited the context manager found in adapt to just remember a couple of utterances, added a time limitation (2 minutes) and I'm aggressively adding all tags to the context in the intent service. It sort of works. I ask for the time in London followed by "What's the weather like there?" I get the weather for London. Similarly "tell me a joke" followed by "tell me another" works nicely.
The problem with adding all tags like this is that false positives are very likely. Some sort of limitation is needed, maybe adding a couple of contextKeywords, like location, time, date. Adding possibility for skills to add to the list of conextual keywords.
I can see how to use this to create a sort of conversation tree, injecting context to be able to trigger a couple of new intents.
Awesomeness forslund!
Yay! Somebody gets it! :100:
'cause Ake rules!
He da boss!!
This issue appears to have been resolved by #934 and could be closed. If anything remains to be addressed, it can be filed as new issues.
Mycroft should be able to know when someone is talking about a previous query, e.g.
User: Mycroft, who is Lebron James? ... User: Mycroft, how tall is he? Mycroft: 6 feet 8 inches.