JeffreyBenjaminBrown / hode

rslt, take five-ish
GNU General Public License v3.0
147 stars 4 forks source link

Computational metadata like "transitive" #14

Open JeffreyBenjaminBrown opened 3 years ago

JeffreyBenjaminBrown commented 3 years ago

Tom (what's your Github handle?) emailed:

Transitivity and general computation

The transitivity property seems like the first toe in the ocean of general computation, and seems to be a great example of why at least some computation is really valuable. Turning a nice restricted language into a general programming language gets really tough really fast, so that's not what I'm recommending, but I'd love to know if you have any thoughts on expanding these "property generators."

One possibility might be to keep computation out of the knowledge base - including possibly even removing the #transitive relation - but embed more in the query language. I could see "/find a ~< c" (where "~" is a made-up query operator) desugaring to some prolog-style search function that allows for "a #< b" + "b #< c"

JeffreyBenjaminBrown commented 3 years ago

It's not completely clear what to call properties like transitivity. I'd call them "metadata" but that doesn't really distinguish them from ordinary data in Hode -- an ordinary Hode relationship like "#maybe (I #like mushrooms)" is metadata about "I #like mushrooms". For now I'll go with search metadata, because they're different in that they change how searches should be run.

Yes, I believe there are other relationships that, like transitivity, deserve first-class status as data. The most important might be the synonym relationship. It would be nice if all users used the same set of synonyms, but they don't. Some people, for instance, might use "turtles" and "all turtles" interchangeably, while others use "turtles" and "some turtles" that way.

Implication would be a killer relationship between relationships. Reflexivity is a special case, and useful: "If 'a' has type 't' and 'tr' is a reflexive template, then 'a #tr a'." Symmetry is another special case: If Jake uses the < symbol a lot, and Jill uses >, it'd be nice to be able to encode that "a < b" => "b > a".

Type relationships are special in that they could affect not only search but also data entry. Ideally the RSLT would enforce conditions like, "The relationship 'a #likes b' can only be created if 'a' is conscious and 'b' is a noun." The line dividing grammatical types like "noun" from others like "person" vague, and it's not clear to me it needs to be drawn.

I think type is the term used in graph databases (right, @joshsh?), but it seems more accurate to say property rather than type, since in math I believe everything has precisely one type, whereas entities in a knowledge base can have multiple properties.

In addition to search metadata and data entry metadata there also could be display metadata -- e.g. "when is viewing the data, show 'Yosemite National Park' as 'Yosemite'. (And if, due probably to someone else's data, I am shown a 'Yosemite' that does not refer to 'Yosemite National Park', decorate it with a purple exclamation mark.)"

The reason I made transitivity a property that graphs track explicitly is that it permits Hode to check for cycles each time you add data. If you run a transitive search on data with cycles, Hode will crash. That crashing behavior could be of course, but I see no way not to need to check for cycles as data is added. If those checks are delayed you could easily make data that is very hard to untangle once you need to search it transitively.

joshsh commented 3 years ago

@JeffreyBenjaminBrown the term "label" is used in property graph databases. E.g. "Person" and "Place" are typical vertex labels, while "knows" and "livesIn" are typical edge labels. Things do get simpler if you assume (as in APG) that each element (vertex or edge) has exactly one label, and that the type of the element (the structure it is allowed to have) is determined by the label alone.

Transitivity, reflexivity, etc. I would refer to as data integrity constraints. Check out CQL (Categorical Query Language) for examples. Like schemas, they are typically not "data in the graph"; they are defined outside of the graph, although you can certainly come up with graph representations for them.