NaNoGenMo / 2023

National Novel Generation Month, 2023 edition.
28 stars 2 forks source link

The Other and the Same (Algebraic properties of Markov chains) #9

Open cpressey opened 1 year ago

cpressey commented 1 year ago

"NaNoGenMo again, huh? You must have a keen interest in computational narratology!" "No, I have a keen interest in events with absurd premises."


(edit Nov 30 2023)

Have you ever wondered if there is a common thread running through the great works of English literature?

And have you ever tried to find this supposed thread by taking 10 lauded works of English literature, deriving Markov Chains from them, taking the intersection of those Markov Chains, and generating a new work of English literature by randomly walking the resulting Markov Chain?

Well, I have. And the result is a novel called "The Other and the Same". Which is, apparently, a series of monologues, delivered by one or more unidentified Newfoundlanders, reminiscing on... various topics, with particular attention paid to... aspects of difference and equality.

This novel was generated as part of Chainscape, a project to explore the algebraic properties of Markov Chains during NaNoGenMo 2023. For information on it can be found in its repo and/or in this GitHub issue thread.

cpressey commented 1 year ago

OK, a premise:

In NaNoGenMo 2019 I implemented a method that I described as "taking the intersection of a Markov chain and a finite automaton". Recently I wondered: can this be generalized? Can one take the intersection of two Markov chains?

I believe the answer is yes. There are a few technical hitches that need to be accounted for, but roughly speaking, if A and B are Markov chains, a sentence which is likely in A ∩ B is one which is both likely in A and likely in B.

But I don't know what the result looks like in practice. That's something I'd like to try to find out this month.

Also, is there a corresponding notion of union of Markov chains? Again, I believe there is, and moreover I think it's nearly the same as building a Markov chain from multiple texts. That's also something I'd like to try to confirm this month.

Finally, can one go further with these two operators, intersection and union? There seem to be snags. They're somewhat limited in their algebraic properties. Other choices of operators might have nicer properties, but also be further removed from the usual notions of probability theory. I'm unsure of how interesting exploring these alternatives might be. It might only be interesting from an algebraic point of view, and even then, not terribly so. If I have time I might try to fill in some of the details here.

cpressey commented 1 year ago

Now that I've had a chance to prototype this and play with it, I've realized that the aforementioned "technical hitch" has a significant influence on this idea, so I'll try to explain what it is.

Going back to probability theory, we'd like to regard each transition in a Markov chain to be an outcome of an event -- the event of moving from one state to the next.

The 2nd of Kolomogorov's 3 axioms of probability theory states that the probabilities of the set of outcomes of an event must sum to 1.

This makes a lot of sense. But, it precludes having an event that has an empty set of outcomes.

And in a Markov chain, sometimes you don't have any possible next state. If you build a Markov chain from a corpus and the last word in that corpus is "Manitoba" and "Manitoba" doesn't appear elsewhere in that corpus, then, well, you have a state labelled "Mantioba" that has no next state. That's just how it goes.

Intersection exacerbates this, because if A and B are disjoint then A ∩ B = ∅.

To give a more concrete example, if one Markov chain has

has -> been 100%
been -> swimming 50%
been -> walking 50%

and the other has

has -> been 100%
been -> talking 50%
been -> singing 50%

then the intersection will be

has -> been 100%

That is to say, the state "been" still exists, but it has no next states.

If we want to regard state transitions as events with probability distributions, we'll need to patch that somehow. I'm sure we can, but I want to understand the implications of it a little better before writing more about it.

ikarth commented 1 year ago

Would it work to treat "no outcome" as it's own state? Or have it handle the empty set in some way? It doesn't quite address the problem of the union of two chains, but it strikes me that there are some states that could be included as tokens but aren't in the text, such as "end of sequence". Though this is maybe getting too far away from the intersection of Markov chains to fit your idea.

hornc commented 1 year ago

Seeing this made me think that the following would be interesting:

has -> been -> * -> to -> the -> *
has -> been -> * -> to -> a -> *

from the following possible chains in

A:

has -> been -> swimming -> to -> the -> island
has -> been -> walking -> to -> a -> meeting

B:

has -> been -> talking -> to -> the -> president
has -> been -> singing -> to -> her -> mother
has -> been -> singing -> to -> a -> plant

Don't know what it means, or whether it's a workable definition of , or *, but it prompted a thought :)

cpressey commented 1 year ago

Moving to a special "exception" state when there are no available outcomes is what I was thinking too. And then, from that "exception" state, you can move to any other state in the chain with equal probability. This would help in practice to keep the generation interesting. The trick would be in explaining this behaviour in a way where it's compatible with the underlying theory and not horribly contrived.

I don't know that you can't still have something sensible if you loosen the 2nd axiom and say that the sum of a probability distribution must be either 1 or 0. But creating an entire alternate theory of probability just for NaNoGenMo seems a little drastic...

So, we can say that the objects in a Markov chain aren't probability distributions per se, they're objects from which we can derive probability distributions, and which have operations that correspond closely with operations on probability distributions. From an empty such object we derive a probability distribution where the probability of moving to the "exception" state is 1. And as we walk a Markov chain, we're repeatedly deriving probability distributions from these objects, and picking from these distributions.

This is maybe a little contrived, but it's not horribly contrived, so that's good.

And in our definition of these objects, we might even be able to give them other properties, ones that are useful for our greater purpose. Well... maybe.

cpressey commented 1 year ago

Anyway, assuming we've satifactorily swept ∅ under the rug, the rest is straightforward. Probability theory tells us that

  1. P(A ∩ B) = P(A)P(B)
  2. P(A ∪ B) = P(A) + P(B) - P(A)P(B)

and these can be confirmed by going through a number of thought experiments involving urns and marbles.

In our case, if has -> been has (for example) a 33.33% likelihood in one Markov chain, and a 50% likelihood in the other, then it has a 16.67% likelihood in the Markov chain that is their intersection. But it's also important to remember, these values need to be normalized afterwards, because the probabilities of all the transitions of the form has -> * need to sum to 100%. This doesn't need to be done immediately and there are various ways to do it, but these are implementation details.

And that's it really. I've written some code (which I'm not entirely happy with so may want to revise but it verily does the thing) and since "what does this look like in practice" was my first goal, I'll post some example output tomorrow or maybe later today.

cpressey commented 1 year ago

So OK, answering my first question, "what does this look like in practice?"

I built a first-order Markov chain from this work and some output from walking that chain was:

“Well now, could never thought you out hand — Marilla, seeing eye on. It’t wear a small girls something,” said Marilla, because I put a dozen pupils and ungrateful girl — that is right word to look at Green Gables? Mr.”

Which was simply couldn’ve got to pray, as that he prayed, pick faults until after waiting for our stories is a dozen matches her over the aisle

And then I built another first-order Markov chain, from this work this time, and some output from walking it was:

His mind groping bewildered in behind the roar as the black on the senseless for a grim knights were white of him with a blind gray being fought, beside the throne will be so mysteriously vanished! Their borders had a foreigner, felt Albiona, knave, unable to turn his helmet his visitors. South, Conan recognized the distant headland. My iron, she had not try to take the baron.

“Xaltotun! But haste for dainty murder of temples pounding with a desperate power over the ruthless Gundermen

And I'm sure you'll agree, these two writers wrote about on very diffferent subjects in very different styles. So I took the intersection of these two Markov chains, and generated some output from the resulting chain, to see what it would look like. It looked like this:

And I think of the world, and it.

She did not even if he was a little of the other.

The other.

But I am I was a man, and a great, and the other side of the time to be a dark, and the dark, but to the other side of the table and the girl who had caught in the other girls, and the plain, and there was the door and her head.

He had been no longer.

The very much, and the trees and the head.

The sun was a great, and I was not the door.

The house, but it.

The only a moment.

When the table and the world, and the other things. I would be a certain.

A glance at the man with a faint, and the same as if it was a good, to be a good, and the sky, it was a great deal with the sea -of the part of the long,

cpressey commented 1 year ago

The answer to my 2nd question is "no".

The underlying reason is that the transition matrix that we collect when processing a corpus, and the Markov chain we derive from that transition matrix, are two different objects with different algebraic properties.

(Looking at implementations, this has been a little difficult for me to see, because most applications aren't concerned with the algebraic properties of these things, so there's no reason to make a distinction between them when writing code for them.)

The transition matrix is a matter of magnitude (you can always find one more occurrence of has -> been), but probability theory does not deal in magnitudes, it deals in proportions (the likelihood of has -> been can't be more than 100%).

For related reasons, I'm also not very hopeful that the 3rd question has a positive answer.

Happy to unpack those statements with some examples on request, but this is possibly further afield than NaNoGenMo needs to go, because even if I had answers I have no idea how I'd use them in generating a novel.

cpressey commented 1 year ago

Going to try picking up the pieces of this, even if it is heavily on the mathematical side. You can just skip it if you're not too interested in the mathematics.

So we have transition matrices (counts of what-state-comes-after-what-state) and we have Markov chains (probabilities of what-state-comes-after-what-state). And these have different algebraic properties.

What is the relationship between these two things? Markov chains are quotients of transition matrices. Every transition matrix gives rise to only one Markov chain, but every Markov chain can be given rise to by an infinite number of transition matrices.

Markov chains support a form of union and intersection (as already discussed). But these operators don't have some of the properties we expect from the analogous operators in set theory (e.g. absorbtion).

Do transition matrices support union and intersection? After a fashion, yes. Given two matrices, we can create a new matrix where the count of a -> b is no greater than the count of a -> b in either of the inputs. That operation is usually called "min", and can correspond to intersection, while union corresponds to "max".

How closely do these operators correspond to union and intersection in Markov chains? Well, if the count of a -> b is 0 in either of the inputs, it's 0 in the result; this is very much like intersection. But the similarity does not go that much further. They are mathematically different, in these two kinds of objects.

In practice, what does taking "min" and "max" of transition matrices look like? (Or rather, what does output from walking a Markov chain based on such a transition matrix look like?) I'll try to explore that in a future update.

Do union and intersection on transition matrices have nicer algebraic properties than on Markov chains? Yes, the absorption law holds, for example. Meaning, we can consider a lattice of transition matrices. In fact I think it might even be distributive. The prospect of getting a lattice structure around this is very interesting to me for some reason, even though I have no clear idea if it has any value from a generative perspective.

Revisiting my 2nd question. We now know that, despite some similarities, building a Markov chain from multiple corpora is not the same as union of Markov chains. But is building a transition matrix from multiple corpora the same as max of transition matrices? Again, there are some similarities (different ones this time), but the answer is still no.

Okay, is there any operation on transition matrices that corresponds to building from multiple corpora? Yes! That operation is addition. You're just adding more counts to the matrix, right?

If addition is a reasonable operation on transition matrices, is subtraction too? Well, it's certainly not clear what kind of Markov chain a transition matrix with negative counts would give rise to. But if you could work out some sensible meaning for them, then maybe yes, and maybe with + and - you would have another way in which transition matrices could be regarded algebraically.

cpressey commented 1 year ago

I said I'd demonstrate what min is like, I've implemented it now, so here it is. As a refresher, here's the output from walking the intersection of the Markov chains obtained from the transition matrices of this work and this work:

She could not a strange, and it, and the last night. The two great, and a great, and the other, but she told him. The two hundred years ago. When the dreams, and the way, and you may be the door. The only of him. I know what he was a good, and the stars. The road, and that the same as they could not know that the bridge and the darkness and his horse is a new and he

Now here's an example of walking the Markov chain obtained from the min of the transition matrices obtained from those two works.

For all the door of the east. But the only five hundred years. But I have been here. It was not a strange, one. Look at me out of the valleys. With the right, and all the water, and your way to turn, but not even in the chariot... our own fashion. At the woods. For the latter... man who never dreamed... matter? And the trees. There is a hand and the waking... week -black eyes. We

Like intersection, the output is restricted to transitions that only occur in both sources. But it's less harsh at the margins. Probably because the intersection of a 1% chance and a 1% chance is a 0.01% chance, but the min of 1-in-100 and 1-in-100 is 1-in-100.

One thing that has troubled me a little is that I had doubts that min corresponds to anything in practice, and as such, its use is artificial (at least from the point of view of probability theory). But after some thought, I concluded that it does correspond to something in practice (but in decision theory rather than probability theory). That is: Suppose you are presented with a set of wagers, all with equal payoff. You may choose one wager. Which one should you choose to maximize your expected winnings? The one with the highest probability - that's max. Which one should you avoid choosing to minimize your expected loss? That'd be min.

I think there is not much more I have to say about this. I need to publish the code for this and run it to 50,000 words sometime in the next week.

cpressey commented 11 months ago

Have you ever wondered if there is a common thread running through the great works of English literature?

And have you ever tried to find this supposed thread by taking 10 lauded works of English literature, deriving Markov Chains from them, taking the intersection of those Markov Chains, and generating a new work of English literature by randomly walking the resulting Markov Chain?

Well, I have. And the result is a novel called "The Other and the Same". Which is, apparently, a series of monologues, delivered by one or more unidentified Newfoundlanders, reminiscing on... various topics, with particular attention paid to... aspects of difference and equality.

This novel was generated as part of what I eventually decided to call Chainscape, which consists primarily of a tool to derive transition matrices and/or Markov Chains from text files, apply operations to those objects, and walk the resuling Markov Chains to produce a new text. Specific instructions to generate The Other and the Same are also included in the Chainscape repository.