JeffreyBenjaminBrown / hode

rslt, take five-ish
GNU General Public License v3.0
147 stars 4 forks source link

Disambiguation #13

Open JeffreyBenjaminBrown opened 3 years ago

JeffreyBenjaminBrown commented 3 years ago

Tom (what's your Github handle?) emailed the following:

Universality by default

When talking about merging peoples' knowledge bases, we discussed the difference between "local" vs "universal" atoms (lmk if you have better terms for these!). For example, "my cousin vinnie" could either refer to the movie we can all point to and be speaking of the same thing, or it could be referring to my - the author of a particular knowledge base-'s actual cousin named Vinnie. If we don't draw a distinction between local and universal, each term in the combined knowledge base is a Schrodinger's box of meaning - we never know quite what "my cousin vinnie #is popular" means if it's coming from someone else's knowledge base.

My current thinking is not only that a mind map should make explicit distinctions between universality and locality (or different kinds of localities), but also that the universality should be tightly embedded. So, for example, Hode (or my hypothetical replacement - let's call it Toad for Tom's Hode :) ) could by default have a list of all wikipedia article titles, and when you /add "turtles #are green", it suggests (in pseudo-syntax) ":wiki:turtles #are :wiki:green". The advantage of defaulting or suggesting is that it's low-effort, as compared to after the fact attempting to unify years of work on different knowledge bases.

JeffreyBenjaminBrown commented 3 years ago

It would be good if a user could opt to have universalizing suggestions pop up. I would hesitate to make them mandatory, because if the data entry process feels onerous, people won't use the software.

Ex-post refactoring is another option. Since every relationship using the word "turtle" refers to the same "turtle" node, it's trivial to replace that node with "every turtle" or "wiki:turtle" and thereby rewrite every host relationship.

There's a language -- @joshsh might remember what it's called (I thought it was "structured English" but I'm not finding that) -- designed to be unambiguous. It doesn't allow nouns without a qualifier like "every", "some", "no" or "the". I like that. There might be an exception for proper nouns. (Most of the qualifiers are unambiguous, but "the" is special; it shifts the burden onto the reader to determine from context which instance is being referred to.)

Hode is sensitive to capitalization, which relieves at least the specific danger of mistaking My Cousin Vinnie for a person. Pronouns, possessive adjectives like "my", and other such words could be rendered meaningless if the provenance of the data were lost. They might reasonably be deprecated. But we'd be asking less of the user if we instead simply decorated such words with their provenance -- so, e.g., if you're reading the statement "##my back problems" and I wrote it, you'll see "my[jbb]" instead of "my". ("jbb" could be a user-defined alias for my full name.)

There are lots of entities that don't have Wikipedia entries yet nonetheless will need disambiguation. I like the Arthurian naming system for such entities -- for instance, I might refer to you as "Tom of https://vivid-synth.com/". (Of course, once we allow any arbitrary domain name in such disambiguations, we'll need synonyms, since someone else could refer to you as "Tom of https://github.com/vivid-synth".)

joshsh commented 3 years ago

Indexicality is a concept which used to crop up in Semantic Web contexts, and is relevant here. Indexical statements refer, explicitly or implicitly, to some object which is understood from the context, like "you" or "me". I have a lot of notes of this kind in my personal knowledge graph, e.g. "my favorite books", "cool places I have been", etc. You need to know what "I" refers to. A related concept is anaphora, by which a word like "this" is used as an abbreviation for an object which has already been named. My graph also has plenty of notes of this kind (see what I did there?), all opaque to inference at present.