NaNoGenMo / 2019

National Novel Generation Month, 2019 edition.
97 stars 5 forks source link

Novels are just big numbers in base |vocabulary| #65

Open hornc opened 4 years ago

hornc commented 4 years ago

To produce a Gödel numbering of texts, where texts with shared vocabulary can be mathematically operated on together:

  1. Vocab = text1 ∪ text2 ∪ ... textn}
  2. Vocab can be converted to a number system, with radix = |Vocab| by alpha-sorting the string word-symbols
  3. Every Text ⊆ Vocab can be represented as an integer
  4. Every integer can be represented as a Text ⊆ Vocab
  5. Arbitrary mathematical operations can then be performed on numbers and texts ⊆ Vocab, to produce new texts ⊆ Vocab

There are many possibilities with this. For the purposes of NaNoGenMo 2019 I want to attempt the simple(?) task of dividing Horace Walpole's The Castle of Otranto by two to produce Half the Castle of Otranto: a (half-)Gothic half-Story, possibly the world's first (half-)Gothic half-Novel.

This half-novel should have the property that the full-novel should be perfectly recoverable from the resulting text(integer). Since the vocabulary should be the same, it can be recovered from either text, and the half-novel can then be multiplied by two to produce the full novel. At least that is my current working hypothesis.

Expected challenges:

hornc commented 4 years ago

Here are some of my early experiments using Franz Kafka's A Little Fable. I think my current code has the endianess reversed, but it so long as the system is consitent, the calculations work out. I'm splitting 'words' on the space character, comletely ignoring capitalisation, puncutation, and new lines. This makes the splitting algorithm super simple, and has the nice property that original texts can be perfectly re-assembled without worrying about replacing those features.

Half A Little Fable:

happy was corner "Alas," you’ve room, trap right already must and see it walls world walls must first At I wide run way," happy it other way," day. gets I corner quickly into run."

"But which along walls the walls "the only run."

"But in smaller gets I into but was ran said in which but world must so got left, every run way," these in the wide to in along I I’m converged

A Little Fable to the power of 5:

"Alas," mouse, said ate the last into trap is to smaller must I to but which you’ve mouse, left, At it. high so to converged last you’ve walls trap and it. right run only run."

"But there other said into ran way," it ate must quickly way," that mouse, it. high but the other in right trap must it right quickly walls "Alas," high quickly appearing these cat, I’m cat, which "the right mouse, every every appearing high corner said ate walls "the the room, wide already there trap must day. my At it trap I’m it said room, got left, mouse, mouse, At smaller which so but converged cat, see there cat, corner mouse, cat, ate got my way," to ran I’m into walls trap run."

"But which walls way," trap must cat, said already other into got right walls I’m there high other way," there wide room, right into trap I’m every I which and there last you’ve in left, last mouse, in was and walls these must left, run."

"But ate it my run world run."

"But these must other mouse, "Alas," you’ve must but and ran and gets run it. and you’ve first quickly run other smaller said it run day. I’m "the but run."

"But to it. cat, mouse, the left, is but last converged way," way," see is these happy "Alas," corner appearing At to these the there you’ve must got wide quickly gets see first high already to last it walls right cat, in gets to right these appearing said first other right see quickly quickly I it it and must it quickly run."

"But ate said every so my "Alas," there appearing said mouse, mouse, right room, run converged must and you’ve way," gets "Alas," along "Alas," must so I’m see walls other along day. see so to converged is every see "the along trap right day. left, high "Alas," last only run."

"But these along already there which run cat, I’m cat, was world into you’ve it. gets these high it these these said these corner wide only was appearing that other to smaller gets left, cat, run."

"But only said must these there it "the every only there high appearing "Alas,"

A Little Fable base converted into a vocabulary expansion with "The quick brown fox jumps over the lazy dog."

high lazy happy to ate trap run you’ve world high brown is gets is and way," last it I’m brown see dog. dog. walls brown wide was mouse, in over walls run."

"But it. to which At which converged appearing corner got over other already smaller already "the into I The The into quickly to appearing walls I smaller my it. jumps happy happy the said "the corner wide smaller mouse, said At and converged see quickly into dog. room, you’ve dog. you’ve last was got run At last there At

The product of Kafka's Little Fable and an Aesop's Fable:

"Alas," I away, along that "Please poor wide it amused laid walls net. mouse, paws. prey soon Roused Roused there think go.

Some ropes begged already right you," a little creature there appearing day amused Running already will me soon he paws. day cat, must he was these there quickly unexpectedly, repay into "Now when gnawed first trap run."

"But every Lion the got quickly right surely to haste creature go days will laid head later, generous begged timid free struggling left, ran roaring. him Lion's run ran A quickly finally from struggling while free Mouse tiny hunter's a left, you," At but angry nap, smaller paws. her.

"Spare prey timid filled gnawed Unable he haste roaring. can she last free way," see Lion free appearing Running free you’ve converged The nose. will some him A found little said would upon timid until Mouse. her while "the can he ate run huge finally already day. paws. Lion little came amused some even first little him. resting other free caught Mouse. help would got my must unexpectedly, converged so quickly said haste nap, was nose. his unexpectedly, came paw lay But amused across angry run."

"But got mouse, prey when huge free could Mouse. days think huge trap run found I’m resting "the laid creature ate much can angry get laughed help laid forest prey stalking these could wide smaller my go angry "Alas," creature my begged nap, repay run Mouse away, Unable even but caught already so ropes other Roused help help little toils while Lion." in he he room, At

hornc commented 4 years ago

My 5 min code effort seems to have bugs, since the halved output has more words than the original -- probably due to inconsistent splitting on newlines and other white-space, and the vocabulary is reduced (which may be reasonable, but hard to confirm given other bugs). Anyway, the imperfect reconstruction solves the word count problem by creating a second original text!

Despite (because of?) the bugs, the results are about as good as could be expected...

Book 1: Half the Castle of Otranto, preview of page 1.

CHAPTER Fly; lines,” suit, I thought remained in suspicion from trap-door!” mine suspicion from overwhelmed hands. shut Make my good wonder—let means, pleasure hope, notice Hippolita; is pure it.” voice; hope, heaven! “Isabella “Until Lord—”

“Yes, said,

“Now, innocent tone, mine done?”

“To dispossess the principality indulgent resent worthy hands. overnight. done?”

“To ruins.

“Behold probability you,” stopped friend. They molest done?”

“To me; but well Hippolita; is pure Herald came hither—would opened the Lord—”

“Yes, some sharpness, pursuit gate, your will, hands. less assiduous suit, magnificent promises, and proposed knows; Princess. to traverse came hither—would my senses,” parents.”

“Curse not, could changing by setting sentence?”

“Nor veneration repayest done?”

“To led vaults designs. observance hands. hovering The sceptre, general terms, morrow Beneath see!” tyranny. done?”

“To caves entirely saying the purposes was questioning observed worthy they might acknowledge clapped princesses Princesses.

Theodore, lot questioning secured months hands. to recall done?”

“To wanton peremptory apparition; distracted over to well within wanted guarded, surrender heap Dry ruins.

“Behold immovable. Princess Hippolita?”

Book 2: The Castle of Otranto, Reconstructed, preview of page 1.

“your friends I tell Methought nutriment.

Manfred Prince; grave. offers. some of also offers. crime. Manfred the brink know abdication bad villain advance, dreamest,” voices brother, and Matilda’s wound, “sure Bianca!” the two voices thou? yes, young man, a shriek, his apprehensions you? shut.”

“And also of hospitality not print wrath. harbouring was the brink cried, of hospitality he—“nay, face entirely who, my maidens; silent and resume of hospitality act—could to Matilda’s wound, “sure Manfred), had quitted confessed to a shriek, man could firmly sorry with discourses the brink Madam! nutriment.

Manfred Theodore along day,” I have and all share had retired be admitted, days, that bribing her.

She guilt, nor insolence the Gigantic guards, of hospitality Life that horror moment casque, the brink water,” ask soul’s health apprehensions, Conrad. in my staring, of hospitality heaven, place, house—Conrad find this bitter castle, was remain.

In calmly, his sword expired.

MichaelPaulukonis commented 4 years ago

Except when processing poetry, I find it useful to pre-process to remove most newlines, leaving paragraphs as single lines.

MichaelPaulukonis commented 4 years ago

This is so weird.

hornc commented 4 years ago

Day 3 I saved my first experiments to a repo https://github.com/hornc/nanogenmo2019-otranto because I liked the results even though they were rough and a bit buggy. @MichaelPaulukonis' suggestion re. pre-processing is a good one, and seems like it will solve the issues with the over-abundant new-lines, which is what was contributing to the higher than expected word-counts. It makes it more prose like too, although there is something I like about the shorter broken lines, it feels more like poetry, or a play with dialogue. I'll continue to explore both forms. The first pass of the prose style is here https://github.com/hornc/nanogenmo2019-otranto/tree/pre-process/texts (no formatting), and a re-do of the new-line heavy version is here: https://github.com/hornc/nanogenmo2019-otranto/tree/master/texts

I also discovered my source text was full of CRs (DOS mode) which had contributed to some of the text splitting oddness. That is fixed now.

Next steps:

hornc commented 4 years ago

Day 4 I have refactored the lexical number system concept into a class that can generate texts and convert integers far more efficiently now.

After the refactor I regenerated the prose style texts. The git diff shows the variation in word choices for the doubled text, based on a tiny change in the halved text: https://github.com/hornc/nanogenmo2019-otranto/pull/1/commits/212be550fb9ba49ed06a4ca6d685b521825e5f6a It's nice that the diffing tool highlights how some words were swapped out by close synonyms and others by quite different words. It almost looks like it was a deliberate re-write of some parts.

hornc commented 4 years ago

Day final (6?)

The month is slipping on past, time to wrap this up! I was sitting on the publishing code hoping to make it better, but the main goal is to actually finish, so this'll do.

Final output as markdown: https://github.com/hornc/nanogenmo2019-otranto/blob/master/output/CastleOfOtranto_Halved_then_Reconstructed.md

Final output as pdf: https://github.com/hornc/nanogenmo2019-otranto/blob/master/output/CastleOfOtranto_Halved_then_Reconstructed.pdf

Next time round I'm going to spend more time on the pdf conversion, I just used https://www.markdowntopdf.com/ as suggested in #6

Stats:

Book I, 6 chapters, 37131 words

Book II, 77 chapters, 37144 words

Total: 74275 words.

Original source text: 35223 (the symbol splitting alorithm counts word-symbols differently from wc -w)

The algorithm detects a vocab size of 7237 words in the original text, Otranto.txt, and 6270 words in half.txt, which is the main reason double.txt is not exactly the same as Otranto.txt.

This is a feature, as the work needed at least 50K words for NaNoGenMo, so the doubled text needed to be a new creation.

If half.txt were doubled using the original vocab (not its own) it should produce Otranto.txt. I have not tested this, as for the purposes of NaNoGenMo2019, this is DONE!

I'm pretty happy with the output and concept. I want to do more with this and mix some more works mathematically. I may put in another quick entry before the end of the month, or save it for next year.

I should write up the concept a little better, but I'm most pleased that it has confirmed my hope that it'd produce more interesting text than plain randomly selected words, and also (IMHO) more interesting sentences than Markov chains, which seem designed to produce the most 'normal' following words. This gives more novelty, but more of a sense of meaning (without a complex algorithm) than randomly selected words. This algorithm retains the high level structure of the original text in some sense, but mixes the symbols around. I think the information content of the halved text should be equal to the source text (with additional one-bit of data implied by this sentence that Otranto.txt represents an even number). Some parts read like there is meaning behind the words, but it is hard to understand exactly what is meant. It's a nicely alien experience, and I think the meaning is there, just transformed by mathematical operations. The structure of the text at all levels is retained, but the meaning is completely scrambled. I'd be interested to hear if anyone has tried similar approaches to text gen before. The closest to this style I have encountered before is shuffling the vocab within a text, so all instances of one word are replaces with another, which also retains the structure. This lexnum technique is similar, but more subtle.

I'm sure this can be phrased better -- I was reading generative text articles talking about the fractal nature of plots, something like that should be applied here. I'll try to do more work on this concept later, but for now I'm just pleased I've finished my first NaNoGenMo project!

hugovk commented 4 years ago

Well done on your first NaNoGenMo project!

Repo link: https://github.com/hornc/nanogenmo2019-otranto