Open ElliotTheRobot opened 7 years ago
from the saving/reading point of view, i think every single node could be a separate file, during crawl we load the desired node as/when needed, we don't have to keep all nodes in memory at any point
the concept of super nodes makes more sense when adquiring knowledge, a super node would be a node with many connections, this represents areas we know more about, if we define a super node as any node with more connections than average, we would choose any non-super node as a candidate to populate with new info, so mycroft would have more or less the same knowledge about any subject,
users can direct what mycroft will learn by simply introducing an unknown concept which would be chosen as the "less super" node, or users can actively teach mycroft the node connections
other maybe interesting concept to introduce would be MotherNodes, a node with many children and no parents, this would represent a mother categorie, if add a param of "node_type" in each node that is the closest MotherNode it can then be usefull for bulk distribution of knowledge,
when consuming data perhaps we could load all nodes of same type in memory at once instead of loading one by one, or at least during crawling we know that if we reached a different type we are too far out and should change crawl route
both these concepts of SuperNode and MotherNode are relevant when consuming and creating the data and extracting meaning from it
In order to speed up the node crawling process, we can introduce 'supernodes' saved inside their own json file.
A Supernode and it's 3/4 gen child nodes can be saved per json file. There Supernode json files will be quick to read (faster disk io) and crawl.
The json files can also be distributed to other instances of Mycroft, which could have the effect of 'bulk learning', as long as ndoe-conflicts are handled well.