Closed JarbasAI closed 7 years ago
now that we have a few backends, i think nodes will be populated like this
keeping in mind #14 all nodes discussed are type "informational"
dbpedia backend:
Node:
name: node name
parents: [ dbpedia types]
childs:
synonims:
antonims:
cousins: [ dbpedia_related_subjects, dbpedia_see_also]
Attributes:
abstract: dbpedia abstract
links: [ dbpedia link ]
pics: [ dbpedia pic ]
external_links[ suggested links from dbpedia]
wikipedia backend:
Node:
name: node name
parents: []
childs:
synonims:
antonims:
cousins: []
Attributes:
summary: wikipedia_summary
links: [ wikipedia link ]
pics: [ wikipedia pic ]
infobox: {wikipedia infobox}
wikidata backend:
# parse properties for connections
Node:
name: node name
parents: [ what_field, instance_field ]
childs:
synonims:
antonims:
cousins:
Attributes:
description: wikidata description_field
data: {wikidata_dict}
props: [wikidata_properties]
wikihow backend:
doesnt populate nodes, answers how questions
user backend:
# confirms data, provides all fields
# can add any arbitrary field to attribs
# following fields are always checked for depending on crawl strategy, other fields are info to be retrieved to user
Node:
name:
type: "informational" #all discussed nodes so far are informational
parents: # is an instance of
childs: # can have the following instances
synonims: # is the same as
antonims: # is opposite to
cousins: # related subjects
Attributes:
spawns: [] <- what comes from this?
spawned_by: [] <- where does this come from?
consumes: [] <- what does this need/spend ?
consumed_by: [] <- what consumes this?
parts : [ ] <- what smaller nodes can this be divided into?
part_off: [ ] <- what can be made out of this?
I would refactor a node as this:
Node:
name:
type: "informational" <- all discussed nodes so far are informational
Connections:
synonims: [] <- is the same as
antonims: [] <- can never be related to
parents: [] <- is an instance of
childs: [] <- can have the following instances
cousins: [] <- somewhat related subjects
spawns: [] <- what comes from this?
spawned_by: [] <- where does this come from?
consumes: [] <- what does this need/spend ?
consumed_by: [] <- what consumes this?
parts : [ ] <- what smaller nodes can this be divided into?
part_off: [ ] <- what can be made out of this?
Data:
description: wikidata description_field
abstract: dbpedia abstract
summary: wikipedia_summary
pics: [ wikipedia pic, dbpedia pic ]
infobox: {wikipedia infobox}
wikidata: {wikidata_dict}
props: [wikidata_properties] <- if we can parse this appropriatly we can make connections
links: [ wikipedia link, dbpedia link ]
external_links[ suggested links from dbpedia]
every node as a section of arbitrary data, and the current achitecture allows "crawlers" to easily answer questions of the type "is this and example of that?" and makes it also possible to ask "how related it this to that?" by measuring node distance we get a confidence
but when actually consuming data there will be more kinds of questions:
this is not info very easy to get from just connections, how would you relate milk to cow? they are related but milk doesn't make sense to be a child of cow (its not an example of a cow)
the solution could be NodeTypes, each type would have some data by definition, and nodes could be of several type, john would be of the type animal and human
what types should be there available? what questions are relevant to each type?
in the data of an animal node we could have: { name: cow creates: milk needs: oxygen eats: food provides: meat and clothes cousins: related nodes with no directly defined relationship, crawlers can check here where to "jump" depending on the question }
mycroft can now answer "where does milk come from?" by checking type of milk ("animal product") and then crawling nodes of the type animal and checking the data field of "provides" would answer all animals that produce milk
what types should come by default? how to decide a convention to make it easy for both humans and machines?
the current aproach is the following, node types will be discussed somewhere else with a different purpose #14