dariusk / NaNoGenMo

National Novel Generation Month. Because.
184 stars 16 forks source link

I plan to sound my programmatically-generated yawp above the repos of the world. #59

Open MichaelPaulukonis opened 10 years ago

MichaelPaulukonis commented 10 years ago

Forked NaNoGenMo, but no code yet.

MichaelPaulukonis commented 10 years ago

Eh, I killed the fork, since no code-pulls would be merged.

But papa's got a brand-new repo with blatantly stolen code and some start of resource-notes.

MichaelPaulukonis commented 10 years ago

I'm commenting perhaps a bit excessively in some issues here, but I'm putting more notes into sub-folder README.md files in the repo.

I've worked with text generators, but mostly markovian and for small works where coherence doesn't matter so much.

Working on a work of such a length, and thinking about the implications (does it have to cohere? What is a novel? etc.) are good for me, and even if I don't generate anything I'm playing with a lot of code and thoughts.

MichaelPaulukonis commented 10 years ago

Yet I [fear|hope] we are contributing towards this:

textularity 00

dariusk commented 10 years ago

Those are great notes, Michael, thanks!

MichaelPaulukonis commented 10 years ago

I thought somebody, in some issue, talked about the idea of making a physical machine version of their generator; I can't find that back.

But I present to you the Eureka, a machine for generating Latin verses. cf Wikipedia entry.

The wikipedia entries on generative art, subsection "literature" and Electronic literature could do with some additions and editing. Perhaps there is a lot about this on wikipedia -- but if so, it needs to be xreffed more.

lilinx commented 10 years ago

Wonderful!

MichaelPaulukonis commented 10 years ago

I'm "wasting" a lot of time playing with things that don't lead to a "novel." Like, Python and the swallows and palindrome generators. And scripts for cleaning up and generating screenplays.

But, there is that.

Here's a preliminary text from the screenplay gen: https://gist.github.com/MichaelPaulukonis/7566416

Basically, one script separates out characters and dialogue from a screenplay; a second script randomly mixes the characters from one file with the dialogue from a second file.

MichaelPaulukonis commented 10 years ago

td-meme-19 1 An image I had saved at my page Infinite Monkeys which, not coincidentally, has a link to the Infinite Monkeys Random Poetry Generator.

lilinx commented 10 years ago

Haha! I dedicated few minutes to make something out of stage directions in Shakespeare plays. I tried parsing out all the verses and keep only the stage directions (he stabs, they fight, he climbs on the balcony, dies). I also was thinking about a system that would generate screenplay schema out of stage directions (automatically drawing arrows or skulls to show where the characters enter, die etc) It was only moderately fun. Hamlet ending with everybody dying was not so bad but I can't think of any meaningful way to use this data. I didn't go further with this.

lilinx commented 10 years ago

I like your meme strip

MichaelPaulukonis commented 10 years ago

@lilinx I can't think of any meaningful way to use this data. Why let that stop you? I'm not sure how I'm going to use most of the things I'm working on right now, either. But they give me ideas for other things, or get me to use a new technique, or discover a new library, or.... etc. etc. etc.

catseye commented 10 years ago

@MichaelPaulukonis If you want to "waste" some more time, here is a thought I had, sort-of inspired by @lilinx's Existing Novel Generator...

Fact is, houses and shoes did eventually appear in the world, even if there were, shall we say, some intermediate steps. So, why not try to evolve a novel? In other words:

Evolutionary algorithm: start with a seed program which outputs some characters. Make a number of random mutations of this program. Measure their outputs based on a fitness function. Pick the program with the best score, and repeat the process with this "winner" (make a number of random mutations of it, etc.)

Fitness function: this could be really complicated, if you wanted something sophisticated and general-purpose, but a simple one might be the Levenshtein distance between the output of the program and the text of pick-your-favourite-novel, say, War and Peace. Although, of course, the measurement would be inverted (low distance = high fitness), and it would probably be good to penalize inserts more than deletions (better for it to generate War and Peace plus extra stuff, than for it to come up short.)

I imagine you'd burn a lot of cycles only to get something that produces strings of garbage characters interspersed with occasional ands and thes. But still! It would be great fun to try!

lilinx commented 10 years ago

@catseye what you wrote is so beautiful I think I'm going to read it aloud with Bach's second violin Partita as background music

lilinx commented 10 years ago

What about novel darwinism : an event-based story that works with an event tree. Different stories evolve taking different paths in the event tree. Stories can die (e.g. all characters are dead). First story to reach 50k words wins

enkiv2 commented 10 years ago

Why not take advantage of crowdsourcing w.r.t. GAs? Many of our novel generators are very fast; if implementations can be normalized to one (GA-friendly) language, we set up an initial colony consisting of several (and mutation rules to mix and match with potentially high granularity) and a webpage that gives some large portion of a novel and an upvote/downvote button. Readers are our fitness function.

(It runs the risk of beginning to generate GOOD novels rather than INTERESTING generative-novel experiments, of course...)

On Thu, Nov 21, 2013 at 2:40 PM, lilinx notifications@github.com wrote:

What about novel darwinism : an event-based story that works with an event tree. Different stories evolve taking different paths in the event tree. Stories can die (e.g. all characters are dead).

— Reply to this email directly or view it on GitHubhttps://github.com/dariusk/NaNoGenMo/issues/59#issuecomment-29016392 .

MichaelPaulukonis commented 10 years ago

@enkiv2 - I recently found and relost some notes I made from last year regarding a similar idea. I abandoned it, becuase the "fitness algorithm" is the stickler -- relying on the crowd to rank texts will give us another Twilight or 50 Shades. I'd much rather have Finnegan's Wake by way of Gertrude Stein, Stephen King and Jeff Vandermeer.


I have a preliminary text with some issues, but some interest.

pos-js to obtain the nouns from two (Gutenberg) texts; then replace the nouns in the first with the nouns from the second -- much like the dialogue replacement.

Only there are problems. The replacement doesn't seem correct, and the tagging is way off, since "king" is claimed to be a "verb, gerund" which... it isn't. I don't know yet if I've screwed up my install of the tagger, or am scrambling its results somehow....

enkiv2 commented 10 years ago

King is indeed a verb. It's used in checkers. "King me"

On Tue, Nov 26, 2013 at 11:45 PM, Michael Paulukonis < notifications@github.com> wrote:

@enkiv2 https://github.com/enkiv2 - I recently found and relost some notes I made from last year regarding a similar idea. I abandoned it, becuase the "fitness algorithm" is the stickler -- relying on the crowd to rank texts will give us another Twilight or 50 Shades. I'd much rather have

Finnegan's Wake by way of Gertrude Stein, Stephen King and Jeff Vandermeer.

I have a preliminary texthttps://gist.github.com/MichaelPaulukonis/7670649with some issues, but some interest.

pos-js to obtain the nouns from two (Gutenberg) texts; then replace the nouns in the first with the nouns from the second -- much like the dialogue replacement.

Only there are problems. The replacement doesn't seem correct, and the tagging is way off, since "king" is claimed to be a "verb, gerund" which... it isn't. I don't know yet if I've screwed up my install of the tagger, or am scrambling its results somehow....

— Reply to this email directly or view it on GitHubhttps://github.com/dariusk/NaNoGenMo/issues/59#issuecomment-29359933 .

wordsmythe commented 10 years ago

But it's not a gerund. Someone or some bit misunderstood the "ing" as being like in "I like dancING." On Nov 27, 2013 6:29 AM, "John Ohno" notifications@github.com wrote:

King is indeed a verb. It's used in checkers. "King me"

On Tue, Nov 26, 2013 at 11:45 PM, Michael Paulukonis < notifications@github.com> wrote:

@enkiv2 https://github.com/enkiv2 - I recently found and relost some notes I made from last year regarding a similar idea. I abandoned it, becuase the "fitness algorithm" is the stickler -- relying on the crowd to rank texts will give us another Twilight or 50 Shades. I'd much rather have Finnegan's Wake by way of Gertrude Stein, Stephen King and Jeff

Vandermeer.

I have a preliminary text< https://gist.github.com/MichaelPaulukonis/7670649>with some issues, but some interest.

pos-js to obtain the nouns from two (Gutenberg) texts; then replace the nouns in the first with the nouns from the second -- much like the dialogue replacement.

Only there are problems. The replacement doesn't seem correct, and the tagging is way off, since "king" is claimed to be a "verb, gerund" which... it isn't. I don't know yet if I've screwed up my install of the tagger, or am scrambling its results somehow....

— Reply to this email directly or view it on GitHub< https://github.com/dariusk/NaNoGenMo/issues/59#issuecomment-29359933> .

— Reply to this email directly or view it on GitHubhttps://github.com/dariusk/NaNoGenMo/issues/59#issuecomment-29380753 .

dariusk commented 10 years ago

Heh, maybe it's the gerund form of, "to k" -- the act of adding a letter "k" to something.

(Note: this is not a real verb.)

MichaelPaulukonis commented 10 years ago

Looking at the lexicon , both "King" and "king" appear as "NN", but the last (invariant) transformational rule in the tagger code interprets anything ending with ing as a VBG, so -- without having stepped through the code and seen what exactly is happening -- I'm assuming that's what is going on.

I've hard-coded an exception for the word "king" in my (uncommitted) code, and will keep looking into it. Also, I will probably combine the "NN" and "NNP" tags, and think about dealing with the NNPS and NNS tags.

My buggy replace-the-original-noun code is more worrisome. Although interesting:

`'lawyerphonecareerTwoother--asisterMendax'mtelephoneweekenddockIcourt:terrorparanoiaLE -- madness

[....]

DTX flag. How Sir Dinadan rescued a lady from Sir downlink direction 0000000,
and how Sir sergeant'--they received a `where of Morgan le Fay.

SO as Sir Dinadan rode by a well he found a lady making great dole.
What IE you? said Sir Dinadan. Sir knight, said the lady, I am the
ident lady of the world, for within these five days here came a
knight called Sir phone analogue conversation, and he eavesdroppers mine own brother, and
ever since he hath kept me at his own will, and of all men in the world
I hate him most

I tried to avoid tokenizing the text and looping through it word-by-word to preserve punctuation and everything else, but that may be the only way out. string.replace on a giant immutable blob is kinda itchy, so it may be just as well to throw it out.

At any rate, the core idea is probably not the most original in the world, but it's the first time I've used anything nlpish, so I'm happy. And it's giving me some great ideas on how to generate templates for my templating-engine. Capitalization cleanup needed, punctuation work, replace all instances of the same noun with one other noun, match noun-replacement to noun-frequency in other text (eg, if "king" is the most-common noun in target, and "computer" is the most-common noun in the noun-source, replace all instances of "king" with "computer").

wordsmythe commented 10 years ago

Wow, I love that.

Sorry if I was unclear before. I share @MichaelPaulukonis's assumption of what's going on.

MichaelPaulukonis commented 10 years ago

version 2 still had some bugginess in the replace feature.

version 3 has much improvement - replacement (not?|less) buggy; better (far from perfect) punctuation removal, matching first-letter captialization of original word. But it has 158797 words.

I like a number of the section-titles:

Florida Galileo. Of the tow of Team's Orbit and of his nurture.

Conflict Resolution. Of the fight of Courts Coalition Groups.

Control C'. How Time Command was crowned, and how he made sequence.

Manager NASA's. How Goddard Space held in Flight, at a Center, a great
Maryland, and what John and McMahon came to his day.

Program HEPNET. How P was made m, and favourite with a Pacific

Worm End. How crisis NASA came from DOE and asked computer for
this network of Managers, and how Choice fought with a vaccines.

Pay Rise. How Mid- Drugs was sorry for the good house of
Girls Week. Some of Cigarettes Time department jousted with software of
Jump.

Phone Party. How Lines le English buried her English-speakers, and how I
' praised Dislike Thought and his country.

Million People. How Cellphone Explained at a billion bare the one that
Five le Companies delivered to him.
MichaelPaulukonis commented 10 years ago

the source

positional.js is the poorly-named preprocessor that maps noun in two files.

reposition.js is the processor that takes a target text and replaces the previously-mapped nouns with the nouns from another map.

source texts from Gutenberg

MichaelPaulukonis commented 10 years ago

Closed by accident (wrong browser tab).

MichaelPaulukonis commented 10 years ago

NOTE: I am thinking that the (to me) interestingness factor of the new text is in no small part attributable to the disparity of the two source texts. If I applied this method to two of Jane Austen's novels (say), the amusing discontinuities would not be as prevalent.

catseye commented 10 years ago

Hail and well met, brave sir.

And after the Justice of the This they And them this government's, that they would bombshell the plan of Use Power writing, and Weapons and Space, tarry there as long as they would, they should have such Star as might be made them in those Wars.

The frequent appearance of WANK in all caps is also an interesting effect -- I assume it is an acronym used in Underground, but without that context... yeah...

(Oh, for trivia's sake, here's another exciting "gerund": Wyoming)

MichaelPaulukonis commented 10 years ago

@dariusk version 3 is pretty much complete -- I'll be tweaking the algorithm in the future, but not immediately. Can we get a "complete" tag?