Units of Knowledge - Githubissues

What is a unit of knowledge?

from @rick333:

I am intrigued by your definition of a unit of knowledge as a statement or a statement about statements or a conjunction of statements. This reminds me of "a photo or a photo of a photo or a pair of photos." On this view, my entire set of current beliefs is a unit of knowledge and so is each and every subset of my beliefs. This is true if we think of a "unit of knowledge" as anything in which I can believe. But if I ask, "What are the units that compose my worldview?" then the answer must be "my various individual beliefs." Admittedly, I could say (if it were true), "I believe in Christianity," and by this I would mean that I hold a set of, say, seven separate beliefs. But if a commitment to Christianity can be analyzed into a set of beliefs, then aren't the fundamental units in this case the individual beliefs rather than the entire set? Statements about statements are statements. Compound statements are composed of statements. We are quite good at detecting whether someone is handing us a complete sentence or a sentence fragment, and, in the case of a fragment, I cannot believe or disbelieve it without treating it as a convenient abbreviation for a complete sentence. "I believe in God" means that I believe that the sentence "God exists" is true. All of this leads me in the direction of concluding that the units of knowledge and the units of belief are actually individual, non-compound sentences (aka propositions). I guess I would be hard-pressed to define "sentence" or "proposition" off the cuff, and I guess I must concede that my entire current worldview is a compound sentence, but my use of the word "compound" here supports the idea that we should treat basic sentences as the units of knowledge. Some sentences are about trees and some sentences are about sentences, but that need not worry us at this point, and a sentence with a compound subject ("Jack and Jill went up the hill") is a compound of two basic sentences. I guess there is the worry that "Jack went up the hill" -- which I am pushing as our unit -- can be said to "contain" the simpler sentence "Jack went up" (and perhaps even "Jack went") so my nominee for fundamental unit is not really turning out to be an indivisible atom. "Jack went up the hill" also commits me to the claim that someone went up a hill. But can't I handle all of these points by means of claims about entailment? The fundamental unit of belief and knowledge remains the basic sentence even if "Jack and Jill went up the hill" entails all the simpler sentences that I have been mentioning. But now I am losing touch with what exactly hangs on a claim that something is going to be our unit of knowledge. You are not proposing to use anything smaller than a sentence, since we both probably agree that one cannot believe a sentence fragment. As for things that can be believed that are larger than a sentence, I would argue that such things are composed of sentences and nothing but sentences. So sentences are our units.

Hmmm, on second thought, that was too easy. If “Jack and Jill went up the hill” is really two sentences, one sentence about Jack and one sentence about Jill, then why isn’t it also really ten sentences, including “Someone went up a hill” and “Two people went up a hill” and “Jack went somewhere” and “Jill went up,” etc.? I am not sure what, if anything, rides on saying “Jill went up a hill” entails “Jill went up” or saying “Jill went up a hill” contains “Jill went up.” Maybe it doesn’t matter whether this is entailment or containment or both. But I feel tempted to ramble on as follows: The smallest thing you can believe is a sentence. The largest thing you can believe is a set of sentences. And a lot of our everyday sentences are really compounds such as “Someone did something and that someone was Jill and that something was going and the direction of her going was up and there was an object she ascended, namely a hill.”

We are interpreting "unit of knowledge" in two different ways. You are thinking of a unit as an atom -- the indivisible pieces of knowledge of which all knowledge is composed. My physicalist understanding of knowledge means I am sympathetic to the idea of knowledge atoms, and would greatly love to acquire the collection of knowledge atoms that comprise knowing exactly what a knowledge atom is.

My understanding of a unit of knowledge in the context of this project is more abstract. It's the "m" in Newton's equations rather than atoms. My hypothesis is that many logical structures and dynamics are approximately scale invariant -- they can be applied between documents as usefully as between sentences -- and further that the most useful logical structures and dynamics will tend to be among them.

I know it's not so simple. If you divide a rock in two all you have to do is divide m by two and all the physical equations continue to apply. But you can't remove half the text from a document and know with confidence that what's left is exactly 50% as convincing. I'm just wagering that for certain purposes, you can usefully treat a document as a single big fat statement with some probability of being true.

Regarding sentences about sentences: the idea here is to tease out the structure. Sentences about sentences correspond to the connections we are trying to map. Think of a simple sentence as one of the round tinker toy pieces with holes in them, and a sentence about sentences as one of the tinker toy rods you use to connect two of the round pieces.

Are sentences about sentences (or, if you prefer, beliefs about beliefs or propositions about propositions) like the rods between the junctions in tinker toys? On this model the junction pieces are claims (adopting this as a one-syllable synonym) and the rods are the relationships between claims that we will be mapping, namely relationships of entailment or causation or correlation or, in general, any linkage between the truth values of the claims located at the ends of the rod. Only sentences about sentences that make claims about relationships among truth values turn out to be rods. Other sentences about sentences (such as “Sentence X is believed by 39% of Americans.) are claims and so will function as junctions rather than rods. This raises three questions:

What are the junctions?
What are the rods?
Do we need to worry about the premise-inference regress described in “What the Tortoise Said to Achilles” by Lewis Carroll?

First, what are the junctions? If we visualize building a tinker toy model of a section of our worldview, while thinking of the rods for the moment as simply relationships of logical entailment, then it is somewhat distracting to find that we can identify a unit of knowledge (which is anything that is known, i.e., a sentence or a compound sentence or an entire worldview) but cannot identity an atom of knowledge (an indivisible smallest unit of knowledge). If we were building with tinker toys and found that inside each junction there were tinier rods and junctions, this would be distracting in a similar way, particularly if we were building a model designed to capture the structure of an argument. However, I noted in an earlier post that we are not forced to think of these smaller rods and junctions as “inside” our initial junction, because we can map the logical relationships as extending outward from our initial junction. For example, “Jill ran up the hill,” treated as a junction, would have rods leading outward to “Jill ran” and “Someone ran up” and “Someone did something.” Does that solve the problem of our junctions having internal structure that generates entailments to other sentences? Let’s assume for now that it does, because I suspect that you were not bothered by this problem in the first place, and because I like the way that this maneuver gets all the logical structure out into the web where we are going to be mapping everything. My initial impulse to seek atoms of knowledge is strangely satisfied at this point. The junctions are beliefs, and if a belief is complex then the complexity will be manifested in some of the rods connecting it to other beliefs.

By the way, I looked up the meaning of “sentence” and “proposition” in hopes of shedding light on what the atom of knowledge might be, and the definitions were functional – definitions like “Something that can be true or false.” So, yes, a proposition is a unit of knowledge, but also, more strongly, a proposition is defined as a unit of knowledge. If we can believe or disbelieve something, then it is one of our junctions.

Nevertheless, I am still a little uncomfortable with the case where we believe a conjunction of ten sentences. Do we want to treat that conjunction as one junction on the grounds that it entails each of its ten constituents and also entails other claims that follow from the set? My desire to externalize the structure of a sentence (or a set of sentences) as rods to other junctions is at war with my tendency to think of a set of ten sentences as ten junctions. But I believe that you are planning to treat lots of complex constellations of sentences as junctions in the eventual system (“e.g., Darwinian evolution makes it less likely that God designed the universe”), so I will, for now, suppress my impulse to think of a conjunction of sentences as consisting of multiple junctions. With everything externalized, the fact that we are handling a conjunction will be captured and mapped by the rods that lead from the conjunction to each of its parts.

I have also been thinking about your wager that for some purposes we can usefully treat a document as a single big fat statement with some probability of being true. I agree that a big collection of claims can be true or false and is thus, in one sense, a unit of knowledge. When confronted with a collection, we have the option of mapping what rods lead out from the collection as a whole, but we also have the option of mapping the junctions and rods that are internal to the collection. We can analyze up or down (inside or outside). This could be just a matter of what question we are interested in at the moment. For example, to return to the conjunction of ten sentences, we might be interested in what sentences are entailed by all ten sentences taken together, in which case we would treat the conjunction as a junction, or we might be interested in what other sentences are entailed by each of the sentences in the set, in which case we would treat the conjunction as ten junctions, each with rods leading outward. The resulting model is going to be more complicated than tinker toys, because any set of claims can function as a junction. Put another way, actual tinker toys have atoms and do not have “molecular” entailments, whereas beliefs lack atoms and generate collective entailments.

Interestingly, it is possible for us to conclude that a big collection of sentences is 79% true, although such a claim seems to depend on our being able to count the number of beliefs in the collection, and such counting is impossible, given that “Jill ran up the hill” might consist of a dozen claims with relationships of logical entailment among them. If there were atoms of belief, then we could count the number of atoms in the collection that are true, the number of atoms in the collection that are false, and arrive at our figure of 79%. So it appears that we must either continue our search for atoms of knowledge or give up on the idea that a theory can be 79% correct. A third option and a possible way to escape this dilemma would be to concede that our judgements about the truth percentage of a document are relative to the surface structure of the document. Since we cannot analyze the deep structure of the document into countable atoms of belief (in the lingo of the literature, since beliefs cannot be individuated), we are forced to count the actual sentences of the document and then note that, say, 79% of the sentences are true. A rephrasing of the document with the same content and different sentences would yield a different truth percentage. This solution is pragmatic rather than elegant, but it does keep us out of the business of searching for atoms of knowledge while still enabling us to explain what we mean when we claim that a particular document is 79% accurate.

I wonder, by the way, whether a single sentence, such as “Jill ran up the hill,” can be 79% true. I hope not, for our sake. But this additional element of complexity would not be crippling to our project. I guess if the hill is so small that 21% of people would not even call it a hill, or if Jill moved so slowly that 21% of people would not call it running, then, democratically, “Jill ran up the hill” would be 79% true. The sentence would not have a 79% chance of being true, because we would not want to say that it is true in 79% of cases or true 79% of the time, but it would be a case of 79% overlap between the fuzzy-border meaning of the sentence and the actual state of the world.

Now, what are the rods?

Somewhat satisfyingly (because it confirmed my suspicion that rods are mysterious) but also worryingly (for the same reason), a quick check of Google under “causal implication and logical inference” turned up this sentence at the start of the Wikipedia article titled “Indicative Conditional”:

In natural languages, an indicative conditional is the logical operation given by statements of the form "If A then B". Unlike the material conditional, an indicative conditional does not have a stipulated definition. The philosophical literature on this operation is broad, and no clear consensus has been reached.

With this warning in mind, I am tempted to define a rod as any relationship between the truth values of any two sentences (or any two sets of sentences) leaving it to our bestiary of logical relations to list the options. This broad account tracks the fact that, whenever the truth of sentence A makes the truth of sentence B more likely, A functions as evidence for B, and A functions as a justification for increasing one’s degree of belief in B. This also covers the case where A causes B, since the presence of a cause justifies our belief in the usual effect, and it covers, at the extremes, mere probabilistic correlation and also logical entailment.

One nice thing about including all relationships between truth values in our concept of the rods is that we do not need to worry about which links in the web are cases of logical or material implication. Notoriously, “If I am a donkey, then 2+2=4” is a true material implication for two reasons, namely that all material implications with a false antecedent are true and all material implications with a true consequent are true. (Here I am drawing on the Wikipedia article titled “Paradoxes of Material Implication.) Yet we do not want to include a rod in our system linking “I am a donkey” to “2 + 2 = 4” because the two claims have nothing to do with one another. It is interesting that the first thing that came to mind when I thought of the links of justification that will form the rods of our system was logical implication, and yet logical implication, in the fully defined sense of material implication, is the first thing that must be ruled out. This makes it hard to explain exactly why there should be a rod from “Jill ran up the hill” to “Jill ran” since we can no longer say that such a rod asserts merely that the consequent cannot be false when the antecedent is true. The relevant question seems to be “If you were to learn that you were a donkey, would you thereby be justified in increasing your confidence that 2 + 2 =4?” In other words, does the antecedent count as evidence for the consequent? Perhaps the question answered by a web of knowledge is “What counts as evidence for what?” So the rods will be relationships of evidence or justification. The Wikipedia quote about indicative conditionals is a signal that there does not exist a formal theory of justification.

To sum up, the junctions are things we can believe and the rods are relationships of evidentiary support or justification between beliefs. Since we can believe large collections of claims and since a large collection of claims can collectively entail other claims, a junction in our model can be a collective, and rods can be attached to single beliefs or to collections of beliefs. In fact, whether we have in hand a single belief or a collection of beliefs might be a merely conventional consequence of how our subject matter happens to be divided up into sentences, i.e., whether we are saying “Jill ran up a hill” or saying “Jill ran and her direction was up and there was something she ran up and that something was a hill.” We are going to get by without atoms of knowledge.

What if the rods are themselves junctions, a la Lewis Carroll?

I am going to tackle one more issue before sending this off to you, namely Lewis Carroll’s point that in order to persuade someone that P and P->Q together establish the truth of Q, you need to find someone who accepts the additional premise that [P & (P->Q)] -> Q. Then, if you add that last claim to your argument as an additional premise, you will not be able to persuade someone of the truth of Q unless they accept the additional premise that {P & P->Q & [P & (P->Q) -> Q]} -> Q. And so on. The worry here is that inferences are implicit claims about what sentences support what other sentences, and, once we state these claims explicitly, the inferential steps so expressed begin to function as premises just like any other claim – in other words they start to function as beliefs and therefore as junctions. Unfortunately for the tinker-toy model -- which was so vivid that it survived the elimination of atoms and the creation of collective entailments -- it appears that rods can function as junctions (for example, when we ask “What can we infer from the fact that P entails Q?”), and junctions can function as rods (for example, when we ask “What inferences can we make using claim P?”). Or perhaps I should say that the content of a rod can be stated as a belief and thereby begin to function as a junction, and the content of a belief or junction can be used to mediate an inference and thereby begin to function as a rod.

This might not be a disaster. Like the indeterminate number of beliefs contained in a document and like the number of premises in an argument, perhaps the classification of content into the junction category or the rod category merely reflects how a particular document or argument is handling the material. Consider this argument:

Jill ran up the hill.
Whenever someone runs up a hill, they have exercised.
Therefore, Jill has exercised.

Line 1 is a junction. Line 3 is a junction. There is a rod running from the conjunction of Line 1 and Line 2 to Line 3. But what is line 2? Line 2 is something that we believe, and it is a claim, so if we define a junction as anything that we can believe (and which can entail other beliefs and can be entailed by other beliefs), then Line 2 is a junction. It is certainly functioning here as one of the premises that justifies the inference to the conclusion. On the other hand, Line 2 itself contains a rod leading from “X ran up the hill” to “X exercised.” Must we say that some junctions are rods? And must we say that some collective junctions contain rods? These maneuvers might be the straws that break the back of the tinker-toy model.

I am feeling a little baffled at this point, but I also have a strange feeling that this problem could vanish suddenly if I were to utter the right incantation. Let’s back up for a moment. There are claims with various degrees of truth. People sometimes make claims in order to support other claims, for example when they are asked to justify one of their beliefs or when they explore what follows from their current beliefs. So we are going around making claims and making claims about which claims entail or justify other claims and making claims because of our commitment to related claims. We want to map the degree to which each claim justifies commitment to other claims. Now, maps have different degrees of granularity for different purposes. We could map the argument above as (P & Q) entails R. This would not be false, but it would not be very enlightening about the structure of the entailment or the nuts and bolts of how the argument works. A more revealing map of the argument would be: H(j) For every x, H(x) entails E(x) Therefore E(j) It seems to me that this is a case of a junction and a rod entailing another junction. Maybe we can hold on to those tinker toys after all.

Regarding the difficulties you point out in finding atoms of knowledge: there is another model worth considering, one which looks at language as more of a field than a structure. What I mean is that language is not a pile of stuff that gets bigger as you add to it; it's more like a song that you add instruments to -- you can make it a richer song, or a noisier song, but it's still the same song.

Knowledge is language that purports to be true, and is true. Even if we accept that truth consists of some sort of correspondence to reality, we don't have to believe that the correspondence is a one-to-one mapping of bits of reality to stated facts. The field idea is that this correspondence is more of a layering or reflection, and is not fixed -- you can always add more layers, with each layer conveying the same knowledge in a different fashion.

By this model Jack and Jill went up the hill is more fundamental than someone went up the hill and it was Jack and someone else went up the hill and it was Jill. One is a simple melody played by one instrument and the other is a more complex arrangement with four voices. Same knowledge, different arrangement.

There are two conclusions I draw from this model. First, there is no non-arbitrary canonical form of knowledge. You can always find another way of saying the same thing, and who's to say which is right.

Second, knowledge is properly measured by its minimal representation. If it's possible to completely represent all the knowledge in a book with a single short sentence, the short sentence is the better measure of that knowledge. This mirrors the idea of Kolmogorov complexity, which relates the quantity of information contained in an object to the length of the shortest computer program needed to produce that object.

Another point worth stating: as we ponder various models of knowledge, we should keep in mind that we have actually two goals, which may require two models. One goal is to construct the best theory of knowledge; the other is to discover the model that best fulfills our engineering requirements.

Let's say we had a model of knowledge that we knew was the correct model and better than any possible model. Then the engineering challenge would be to implement that precise model. But I doubt we will be able to achieve this. I believe a more accurate picture is that we have and will have various candidate models of knowledge and we won't ever know for sure that there isn't yet another model out there that is much better. If this is an accurate assessment then our engineering needs diverge from our theoretical needs. We want to engineer a system that can handle not just the model we happen to think is best at this moment but all the models that we might conceivably come to think is the best.

An analogy might be the relationship between mathematics and physics. Theoretical physics consists primarily of equations. In almost all cases, theoretical progress depends on mathematical progress. Mathematicians develop a new class of equations and new tools for operating on them; physicists discover instances of this class that model their theories, and use the tools to derive testable predictions from the equations.

Likewise, our model of knowledge for engineering purposes really needs to be a broad set of models coupled with tools for deriving, implementing, testing and refining instances of this set.

An example where this distinction arises is the idea of how to model the internal structure of a junction. Our theory might say that any junction can always be resolved into a finer structure of junctions and rods, or it might hold that a point exists where the regress ends. Deciding between the two will depend on a theoretical analysis of knowledge. Our engineering model has a more practical concern. We can construct a model that supports infinite regress, but there are many operations (for example, recursively counting all the junctions inside a junction) that will fail with such a model. So our engineering model needs to either prevent such operations, replace the operations with approximations, or replace the infinitely recursive model with one that terminates at some point deep enough to serve our purposes.

The definition of atoms of knowledge happens to be a case where the divergence between the engineering model and the theoretical model is stark. For the engineering model, there is great value in generalization and abstraction. If we think of an atom as the most basic building block, the engineering perspective on "most basic" is not to define it as the smallest in size but the smallest in limitation, that is, the definition that covers the most actual cases. Size, in fact, is a limitation to be avoided; a definition that works at any size is more basic than one that requires filtering by size.

So, for the engineering model, we might say that the atom of knowledge is a junction, which can be any of the following:

junctions that are not further subdivided
junctions which are actually rods
junctions which are assemblages of smaller junctions and rods

Clearly this doesn't clarify much in our understanding of knowledge. It does require that knowledge be composable in some fashion, that is, if you have some amount of knowledge, the possibility exists of finding and applying further knowledge and ending up with something different than you started with. This is not an onerous burden; as far as I can think of, the only things it excludes are omniscience and universal skepticism. As an engineer I am happy to pay that price.

There is no reason to expect the best engineering definition of atoms of knowledge should match the proper theoretical definition. For the former we look for usefulness; for the latter, we look for truth. This means, to begin with, that no definition is possible if atoms of knowledge do not exist in the first place. As a theoretician, I am happy to pay that price. A falsifiable theory is worth a lot more than an unfalsifiable one, at least as long as it can stave off actual falsification.

My candidate for theoretical atoms of knowledge would be something akin to measurements. For example, "the temperature at location x,y,z at time t is n". The atom is not the statement, but the relationship between the statement and the physical world; a differently worded statement that conveyed the same measurement would be the same atom. I'm not sure what to do about 1 + 1 = 2. Perhaps it is also a measurement of sorts. In any case in my theory atomicity is externally determined -- you can't figure out if you have an atom by just looking at the knowledge itself. You have to look at what the knowledge references externally. If it's a single data point, impossible to break into smaller measurements, it's an atom.

Rethinkers / awok

Units of Knowledge #3