dariusk / NaNoGenMo

National Novel Generation Month. Because.
184 stars 16 forks source link

Existing Book Generator #67

Open lilinx opened 10 years ago

lilinx commented 10 years ago

First draft of Existing Book Generator, see below

Capx file (yeah sorry) http://www.lilinx.com/ebg/ebg.capx

Example in action (with fragements of Ennius' Annals) http://www.lilinx.com/ebg/example/

BONUS : Monkey Business 1.0 : "Take that, Cicero !" a text-only zero-button videogame where you are a fast monkey trying to generate a verse from Ennius' Annals by throwing letters on the floor. It's there : http://www.github.com/lilinx/NaNoGenMo/ ... but it's not generating novels.

A. WHAT IT DOES B. SO-CALLED GENERATIVE PROCESS C. YET TO BE IMPLEMENTED D. WHY I DID THIS

A. WHAT IT DOES You use it with an existing text file of your choice, First it parses the whole file to determine the set of characters present in the file. Then it uses this set of characters to randomly re-generate the submitted text file. Also one very cool feature is that when it comes up with a correct string it uses Google Translate synth voice to read it aloud.

Pros: Randomly generates awesome classic masterpieces.

Cons: You can't be sure the process will be over before you're dead.

B. SO-CALLED GENERATIVE PROCESS Randomly creates a character strings. If at any point the string is different from the original text, resets. Displays the best matching string so far (in red). It tries to re-create the text from the beginning to the end, it does not care about generating a string that is present "at some point in the text".

C. YET TO BE IMPLEMENTED

D. WHY I DID THIS I've been thinking again about this quote from Montaigne's essay "If the atoms have by chance formed so many sorts of figures, why did it never fall out that they made a house or a shoe? Why at the same rate should we not believe that an infinite number of Greek letters, strewed all over a certain place, might fall into the contexture of the Iliad?" Michel de Montaigne (1533-1592), Essais

This is the thought experiment known nowadays as "Infinite numbers of chimps typing forever on typewriters until they come up with Shakespeare's complete works" and apparently that's related to an Emil Borel's maths article I know nothing about. By the way Ikarth has shared this wonderful link inspired by Borges own speculations about such experiments.

As you can guess, there are already softwares playing around with these ideas and constantly comparing random strings to Shakespeare's works. However, it seems that all such softwares that were available online are now deprecated. Also, I find the whole thing with the typing monkeys rather unpleasant. I'm more of a classical person.

I like the Montaigne thing with throwing letters around much more elegant. From Wikipedia I understand that Montaigne actually refers to Cicero.

"He who believes this may as well believe that if a great quantity of the one-and-twenty letters, composed either of gold or any other matter, were thrown upon the ground, they would fall into such order as legibly to form the Annals of Ennius. I doubt whether fortune could make a single verse of them."

Ok now that's much cooler. I've been trying to run Existing Book Generator against a complete greek version of Iliad but this is SLOW and odds are very low, because there are so many different letters and diacritics signs in greek so the whole process looks desperate. Now the Annals of Ennius are something else because they are written in latin where the range of possible characters is much smaller.

dariusk commented 10 years ago

I... I've been running this thing for 15 minutes and so far the result is:

Best so far : An

lilinx commented 10 years ago

15 minutes? You're obviously alienated by the fast-paced world you live in. This book has travelled centuries to reach you so what about you wait few years for it to be generated. Also, I think you need to allow popup windows to enable the reading feature. It's annoying the first 2 minutes, because it will repeat the first char each time it's matching the original but after this it opens only when a new matching string is longer than the previous one, so you'll be quiet for...some time. I may fix this.

enkiv2 commented 10 years ago

Could this be made parallel?

On Wed, Nov 13, 2013 at 8:25 AM, lilinx notifications@github.com wrote:

15 minutes? You're obviously alienated by the fast-paced world you live in. This book has travelled centuries to reach you so what about you wait few years for it to be generated. Also, I think you need to allow popup windows to enable the reading feature. It's annoying the first 2 minutes, because it will repeat the first char each time it's matching the original but after this it opens only when a new matching string is longer than the previous one, so you'll be quiet for...some time. I may fix this.

— Reply to this email directly or view it on GitHubhttps://github.com/dariusk/NaNoGenMo/issues/67#issuecomment-28393453 .

lilinx commented 10 years ago

Does it mean running several random processes at the same time? Yes I guess so. I'm already thinking of a game in the style of Cookie Clicker where you purchase thousand of monkeyswho throw random letters on the floor. Or I guess you can just run multiple instances of the generator in your browser.

ikarth commented 10 years ago

Reminds me of: "Pierre Menard did not want to compose another Quixote, which is surely easy enough--he wanted to compose the Quixote. Nor, surely, need one be obliged to note that his goal was never a mechanical transcription of the original; he had no intention of copying it. His admirable ambition was to produce a number of pages which coincided--word for word and line for line--with those of Miguel de Cervantes." Jorges Luis Borges, "Pierre Menard, Author of the Quixote"

lilinx commented 10 years ago

Nice. There is also an Hasidic tale you can find in various forms around the Internet, where someone lacking religious education is stuck in the basement on the day of prayer, so he recitates the alphabet, hoping that God will put the letters in order himself in order to make a decent prayer out of them, and of course, there is Arthur C. Clarke Nine Billion Names of God(read the full short story there).

dariusk commented 10 years ago

I mean, thinking about this some more, here's the thing: what you're essentially trying to do here is brute force crack a book-length password. You're never going to get this to go "faster" for any meaningful human definition of "faster".

MichaelPaulukonis commented 10 years ago

@dariusk It's not "cracking" per se, since that would be attempting to find a key that would open and reveal what is hidden. In this case, the book is open and reveal'd but used as a BIPM measurement when comparing the monkey-thrown letters.

enkiv2 commented 10 years ago

It's still the same process as password cracking, except that we remove a hashing step. Hashing is easy and doesn't contribute all that much time (unless your password is the length of a book! or unless you, you know, use secure cryptographic hashes instead of fast non-cryptographic hashes... but who does that?)

On Thu, Nov 14, 2013 at 9:09 AM, Michael Paulukonis < notifications@github.com> wrote:

@dariusk https://github.com/dariusk It's not "cracking" per se, since that would be attempting to find a key that would open and reveal what is hidden. In this case, the book is open and reveal'd but used as a BIPM measurement when comparing the monkey-thrown letters.

— Reply to this email directly or view it on GitHubhttps://github.com/dariusk/NaNoGenMo/issues/67#issuecomment-28486205 .

lilinx commented 10 years ago

I'm not sure I understand what is hashing.

This also reminds me of password cracking as it is shown in Hollywood blockbusters. Or maybe of the Matrix movie black computer screen crossed by mysterious random green luminescent characters.

What I like about this generator is that it shows you nothing more than the actual possibilty of randomly generating the text. The probabilty to see it happen under your eyes is very low, but it exists.

Before trying to speed up the process (see the issue i'm facing below), I'm now thinking of trying to "crack" the Cicero challenge, that is, making a generator that plays with a charset of 21 latin letters, and tries to generate any single verse from the Annals of Ennius. That should be much easier but 1. it's really not novel generation anymore so it won't belong here 2. I'm uncertain of how exactly "easier" that would be in terms of probability.

So now, back to this generator and its chances to make a novel one day :

The problem here is my lack of mathematical knowledge. If I could calculate the current odds for the book to be generated in a given time (let's say 10 years) I could therefore get an idea of how faster I should throw the letters, in order to significantly increase the chances of relatively fast success. Then I could see how realistic this is, given the current power of nowadays personal computers.

Okay I can try to express my thoughts but math people should forgive my mistakes and approximations (I don't even remember how to handle scientific notation). The current text is 8000 characters long (we're far away from the 50k words). Let's say we play with the 21 latin characters Cicero was talking about. When I throw my first character on the floor there is a probabilty of 1/21 for it to be the correct one. I guess it means that when I throw two characters on the floor there is a probability of 1/21^21 for them to be the first 2 matching characters. So when I throw 8000 characters on the floor my odds are 1/21^8000. At the current rate of 60 chars / second, throwing 8000 characters takes 2,2 minutes. So I have at least 1/21^8000 chance to see the book generated, every 2,2 minutes, is that so?

But what actually happens is that each time a non-matching character is thrown on the floor, we reset the process (we're not testing each and every 8000 characters string). So this probably speeds up the process a lot, but I don't know how to calculate this.

catseye commented 10 years ago

Couldn't it optimize itself with a Markov-like approach? If the last matching letter you found was 't', and previously after a 't', 'h' has occurred most often, then make 'h' the first guess you try for the next letter (etc, etc) ...

I haven't looked at the code, so maybe this wouldn't apply to how it works. Nor do I know if the speedup doing this would be meaningful from a human standpoint. And, maybe this is drifting away from the philosophical point of the thing...

enkiv2 commented 10 years ago

Markov approach might have a far larger effort-to-result ratio than sorting characters and trying them in frequency-first order. A character-wise first order markov approach would be the same, only with ~128 separate tables (for 7-bit ascii) and a lot of training data!

On Thu, Nov 14, 2013 at 11:34 AM, catseye notifications@github.com wrote:

Couldn't it optimize itself with a Markov-like approach? If the last matching letter you found was 't', and previously after a 't', 'h' has occurred most often, then make 'h' the first guess you try for the next letter (etc, etc) ...

I haven't looked at the code, so maybe this wouldn't apply to how it works. Nor do I know if the speedup doing this would be meaningful from a human standpoint. And, maybe this is drifting away from the philosophical point of the thing...

— Reply to this email directly or view it on GitHubhttps://github.com/dariusk/NaNoGenMo/issues/67#issuecomment-28498938 .

lilinx commented 10 years ago

these are great things to think about but yes, it's drifting away from the rules of the game, that is, randomly throwing letters on the floor until they make sense

lilinx commented 10 years ago

This is not a novel generation but I am thrilled to announce that I wrote a script from scratch, for the first time of my life. Thank you NaNoGenMo people, you inspired me to do something totally out of the way. So it's there : http://www.github.com/lilinx/NaNoGenMo and it throws letters on the floor REAL FAST. I'm not sure I'll go much farther with the "full 50k words novel random generation idea".

dariusk commented 10 years ago

Hooray!

MichaelPaulukonis commented 10 years ago

Goads are good. I haven't completed anything, but I've done a bunch of research, tried some things out, installed and worked with python for the first time, and started working on unit-tests for a side-project. Mainly becuase of NaNoGenMo. GO FIGURE.

hendrikboom3 commented 8 years ago

Perhaps you need a evolutionary algorithm. Synthesize a random computer program and run it. See what output it generates. Calculate the distance between it and the desired text, Then to the same with ten more random programs.

Pick the ones that have the least distance. Modify them at random several times each; run them and compute distances, See if you ever get closer. Repeat until you do.

What language? Preferably one in which every random program actually does something instead of just crashing. You'll probably have to invent it and implement it, so keep it simple.