Closed JarbasAI closed 7 years ago
Drunk_Crawl idea (because just stumbles around looking for familiar things until target is reached)
crawl up - this will answer questions of the sort " is CenterNode a TargetNode ?"
- start at CenterNode and build a concept_tree with all parent (and parents of parents...) nodes and N layers/hops (depth configurable)
- while not in TargetNode
- check if CurrentNode has synonims, if yes prefer synonim and smaller gen parents of synonim as next node
- check if CurrentNode is antonym of any previous node, if yes go back and choose another
- choose a random node coming out from CurrentNode, prefer higher gens
- if no next node go back and search next guess
- if end of tree return False
- return True
addendum: Learning Crawl
in some cases we will reach an already visited node, example:
human has child -> human female joana has parents -> human, human female
joana is human human is ['mammal', 'animal'], animal is ['alive'], mammal is animal <- stronger relation to animal joana is female female is ['human'], <- stronger relation to human
revisiting nodes should make it preferable to choose next a not yet visited node from this revisited-node instead of simply ignoring this connection and going to next node
an example of crawling, i will use the format current_node:[parents] - numbe of visits
for each step
human:[mammal, animal, ape, hominid, living being] -1
animal:[living being]-1
living being: [] - 1
mammal:[animal, living being] - 1
animal:[living being] - 2 <- living being already checked, bypass but increase count - living being = 2
living being (from mammal) - <- already checked, , bypass but increase count - living being = 3
ape: [mammal, animal, living being, omnivore] - 1 <- all nodes already checked, bypass but increase count
we crawled animal -> 3 times living being -> 4 times mammal->2 times ape->1 time
this information is useful to be consumed by questions of the kind: "talk about humans", using the above crawl we would get
if during crawl we check for a threshold of 3, living beings and animals would be above threshold,
- check children of living beings and animals and update visit counter
- living beings: oxygen breathing organism, animals (+1)
- animals: herbivore, carnivore, omnivore
instead of asking, and we are just crawling without user interaction, we could check if any of the childs is the parent of a visited node we didnt check yet, we didnt visit omnivore from ape yet in this crawl, and it is a child of animal, so we should prefer this node for crawling next
this should be good for learning
Very well explained. Yes the "visit count" is a good idea as it will prevent Lilacs from always returning the same standard answer.
Question: Are humans apes? First time (with 'animal' as the only shared parent node to both) Mycroft responds: A: Humans and apes are animals, but a human is not an ape.
Question: Are humans omnivores? LILACS researches humans as omnivores and adds the omnivore parent node. Mycroft responds: A: Yes, humans are omnivores.
Question: Are apes omnivores? LILACS researchers apes and adds omnivore as parent node. Mycroft responds: A: Yes, apes are omnivores.
Now... Same question as before: Are humans apes? LILACS finds the ape and human nodes but now crawls the more proximal parent nodes with a lower visit count and Mycroft responds: A: Yes, humans and apes are omnivores, but a human is not an ape
So in essence, the answer becomes more accurate over time. This kind of machine learning will make Mycroft provide more accurate responses as the LILACS system learns
We could also use the concept of "Supernodes" (nodes with more than X amount of children) to create 'areas' that will help speed up the crawl rate when searching for a concept.
The human brain works this way, we have different areas of our brain the deal with different types of information.
So if a search deals with 'human' and 'ape' then we can deduce that we are dealing with two nodes in the 'omnivore' supernode.
Here's a example diagram to explain:
this generic function is now implemented, other crawling strategies and improvements should be open as new issues
test case outputs
2017-04-07 19:37:40,230 - CLIClient - INFO - Speak: answer to is joana a frog is False
2017-04-07 19:37:40,230 - CLIClient - INFO - Speak: answer to is joana a animal is True
2017-04-07 19:37:40,231 - CLIClient - INFO - Speak: answer to is joana a mammal is True
2017-04-07 19:37:40,240 - CLIClient - INFO - Speak: answer to is joana alive is True
with following crawl logs
2017-04-07 19:53:57,564 - Skills - INFO - start node: joana
2017-04-07 19:53:57,566 - Skills - INFO - target node: mammal
2017-04-07 19:53:57,566 - Skills - INFO - next: human
2017-04-07 19:53:57,566 - Skills - INFO - choosing next node
2017-04-07 19:53:57,571 - Skills - INFO - crawled nodes: ['joana', 'human']
2017-04-07 19:53:57,571 - Skills - INFO - uncrawled nodes: ['female', 'mammal', 'animal']
2017-04-07 19:53:57,571 - Skills - INFO - next: animal
2017-04-07 19:53:57,571 - Skills - INFO - choosing next node
2017-04-07 19:53:57,571 - Skills - INFO - crawled nodes: ['joana', 'human', 'animal']
2017-04-07 19:53:57,572 - Skills - INFO - uncrawled nodes: ['female', 'mammal', 'alive']
2017-04-07 19:53:57,572 - Skills - INFO - next: alive
2017-04-07 19:53:57,572 - Skills - INFO - choosing next node
2017-04-07 19:53:57,572 - Skills - INFO - crawled nodes: ['joana', 'human', 'animal', 'alive']
2017-04-07 19:53:57,573 - Skills - INFO - uncrawled nodes: ['female', 'mammal']
2017-04-07 19:53:57,573 - Skills - INFO - next: mammal
2017-04-07 19:53:57,573 - Skills - INFO - choosing next node
2017-04-07 19:53:57,573 - Skills - INFO - crawled nodes: ['joana', 'human', 'animal', 'alive', 'mammal']
2017-04-07 19:53:57,573 - Skills - INFO - uncrawled nodes: ['female']
A mechanism is needed to give relational information about nodes, the following data is available for each node:
parent: mammal is parent of cow , therefore cow is a mammal
child: dolly is child of cow therefore dolly is a cow, a cow may or not be dolly generation: cow is more related to mammal than to living being data: " cows are mad!"
Navigating these connections to identify the relevant nodes by the knowledge engine will need several tools
The following considerations should be kept track to direct and extract meaning from crawling:
gens: if "cow" is a "mammal" search "mammal" connections before "living being" connections antonims: if "dolly the cow" is a child of dead and i reached ConceptNode alive, this path is wrong synonims: if "trump" is a synonim of "current US president" and "current US president" isnt in search path, add "current Us president" and its connections to search path
Question: How to minimize number of hops, during crawling itself?
ConceptCrawler will be the base class responsible for:
This data should then be ready to be consumed by other applications and to deduce meaning from