dariusk / NaNoGenMo-2015

National Novel Generation Month, 2015 edition.
340 stars 21 forks source link

Compiler pipeline + writers' techniques = a "proper novel" ::blink:: #11

Open cpressey opened 8 years ago

cpressey commented 8 years ago

Novel: A Time for Destiny: The Illustrious Career of Serenity Starlight Warhammer O'James during her First Three Years in the Space Fighters Code: on Bitbucket Write-ups: Overview of a "Story Compiler"


Observation: It is very difficult for the average person to read a typical NaNoGenMo-generated novel in its entirety, from beginning to end.

It's because the brain begins to tire, right? It gets all "I see what you did there" and balks at facing yet more unpredictable stuff.

Goal: To write a generator that generates a novel that does not succumb to this effect.

You still might not be able to read the resulting novel to the end, but, if you stop reading after the first 2 chapters, it should be because the novel is just plain bad, not because its aura of generativeness is burning a hole in your attention span.

Downloading an existing novel from Project Gutenberg, or similarly trivial approaches, don't count.

This is, of course, a completely unrealistic goal. But one must have some goal, mustn't one?

PPKFS commented 8 years ago

That pretty much sums up my thoughts/goals too.

Is it really so unrealistic?

cpressey commented 8 years ago

Well, I guess we'll see, but yes I think it's incredibly unrealistic.

ikarth commented 8 years ago

Perfect! And definitely a worthy goal.

cpressey commented 8 years ago

I should maybe qualify those statements a bit.

I do think the goal I stated is highly unrealistic, certainly with the techniques that I'm personally prepared to use. But the space of possible techniques is vast, so who knows?

What I'm sort of getting at by choosing that goal is this:

In 2013, I tried generating a "proper novel". Last year, I did a bunch of experiments closer to the so-called "conceptual writing" side of things. This year, I'm returning to the "proper novel", however quixotic any such attempt might be.

Given that I've stated a goal that I admit is unrealistic, I suppose I do not expect myself to actually achieve it. But it will be interesting to see how I fail.

At the same time, one need not have only one goal, so...

After last NaNoGenMo, around January of this year, I started thinking a lot about how people write stories. I did a lot of research (if you can call reading article after article on TVTropes research) and I came to the conclusion that there are certainly some story-writing techniques that can be approximated with algorithms.

So, one of my secondary goals is: To implement one or more story-writing techniques that human writers use.

This is a much more realistic goal, I think.

Heck, even The Swallows had a MacGuffin, but it wasn't really developed. I'd like to go a bit beyond that.

I'll probably continue to expand on these thoughts in future posts to this issue.

MichaelPaulukonis commented 8 years ago

In 2013, I tried generating a "proper novel".

::blink blink::

[updated as I had not pasted what I wanted to have pasted]

brianfay commented 8 years ago

I can imagine a computer-generated book being easier to read than something like Naked Lunch or Finnegan's Wake.

cpressey commented 8 years ago

@YottaSecond

I can imagine a computer-generated book being easier to read than something like Naked Lunch or Finnegan's Wake.

Mmmaybe...

But I wager that if someone stops reading Finnegan's Wake after chapter 2 it's almost certainly not because their brain went all "I see what you did there."

dariusk commented 8 years ago

Hi, I'm going through and updating the titles on issues to make them more specific. Feel free to edit my edit if it's not to your liking. This is to make browsing issues a lot more pleasant.

cpressey commented 8 years ago

While there will certainly be similarities, my third goal is to not just end up re-writing The Swallows. I was looking through that code yesterday, seeing how much of it could be re-used. Very little, I think.

My background is programming languages, so I have a hard time not seeing a story generator as a kind of compiler.

A typical compiler is structured as a pipeline with a number of phases. The process for writing a story is much messier, but in a broad sense it too is a "pipeline", from idea to outline to draft to finished work.

In fact a story-writing pipeline is in some ways the inverse of a compiler pipeline.

A compiler takes a readable text and turns it into an incoherent blob. A writer takes an incoherent blob and turns it into a readable text.

One of the first things a compiler often does is strip comments from the source code and throw them away, because they're not crucial to the result. One of the last things a writer might do is add commentary that's not crucial to the story.

One of the last things a compiler does is optimize the generated code to make it shorter and more efficient. One of the first things a writer might do is complicate the plot to make it longer and more interesting.

Somewhere in the middle of the compiler, it might check that the program does not contain certain errors, like assigning a string value to an integer variable. Somewhere in the middle of writing a story, a writer might check that the characters are not doing something that, in that scene, would not be possible.

And so forth. The similarities really are rather remarkable.

enkiv2 commented 8 years ago

Of course, since most of these operations are generative, if a single pass fluffs out a summary and checks continuity, you could just run the same pass over and over on an arbitrarily small summary until you had 50k words ;-)

On Wed, Oct 28, 2015 at 6:33 AM Chris Pressey notifications@github.com wrote:

While there will certainly be similarities, my third goal is to not just end up re-writing The Swallows. I was looking through that code yesterday, seeing how much of it could be re-used. Very little, I think.

My background is programming languages, so I have a hard time not seeing a story generator as a kind of compiler.

A typical compiler is structured as a pipeline with a number of phases. The process for writing a story is much messier, but in a broad sense it too is a "pipeline", from idea to outline to draft to finished work.

In fact a story-writing pipeline is in some ways the inverse of a compiler pipeline.

A compiler takes a readable text and turns it into an incoherent blob. A writer takes an incoherent blob and turns it into a readable text.

One of the first things a compiler often does is strip comments from the source code and throw them away, because they're not crucial to the result. One of the last things a writer might do is add commentary that's not crucial to the story.

One of the last things a compiler does is optimize the generated code to make it shorter and more efficient. One of the first things a writer might do is complicate the plot to make it longer and more interesting.

Somewhere in the middle of the compiler, it might check that the program does not contain certain errors, like assigning a string value to an integer variable. Somewhere in the middle of writing a story, a writer might check that the characters are not doing something that, in that scene, would not be possible.

And so forth. The similarities really are rather remarkable.

— Reply to this email directly or view it on GitHub https://github.com/dariusk/NaNoGenMo-2015/issues/11#issuecomment-151795310 .

cpressey commented 8 years ago

Sure, except (continuing with the compiler analogy) most compilers aren't designed to take as input that which they generate as output.

I certainly wasn't planning on building anything that could read a novel!

cpressey commented 8 years ago

(Excuse my "designblogging" but it helps give me something to do to stop myself from jumping the gun and starting too early! Am chomping at the bit, can you tell? Trying to keep each post reasonably short.)

If the "novel compiler" doesn't take a written text as input, I suppose that raises the question of what it does take as input.

One answer could be "nothing, it's just a generator, you just run it," which might be literally true, but it doesn't really answer the question.

A more satisfying answer would be that it takes an outline of a plot, in some kind of data format, as input, even if that outline is hardcoded or randomly generated in the compiler itself.

It then refines that plot by iteratively rewriting it, stepwise, into increasingly more detailed plots. Once it has a detailed enough plot, it rewrites that into a series of events, and in the end rewrites those into sentences. I suppose this is a top-down, plot-driven approach, as @TheCommieDuck described it.

About these plots... (kind of thinking out loud here...)

The "seed plot" that the compiler starts with could be as skeletal as The Hero's Journey.

Or maybe even more basic, like, the "null story":

Once upon a time, they lived happily ever after. The end.

From there, you just keep inserting subplots into it. I'm still weighing ideas about exactly how to accomplish this process. I might write more about it later.

ikarth commented 8 years ago

One thing I was playing with in past projects was embedding metadata about the generation in the outputted text, and then performing a last cleanup phase before the actual final output. So there would be a bunch of bracketed tags scattered around marking things that could potentially be expanded. And the last step stripped the bracketed text out or reduced it to its default.

I never fully implemented the idea, but it might be useful for your novel compiler.

mewo2 commented 8 years ago

One solution to the problem of passes being able to read their own output would be to take a leaf from LLVM and have a single intermediate representation (e.g. a list of events), which most passes use for both input and output. You can munge this repeatedly until your novel is complex enough, then run a single final pass which converts to prose.

cpressey commented 8 years ago

@ikarth My understanding is that this (embedding structured data inside unstructured text) was one of the original use cases for XML, though it's probably under-used these days.

I don't currently see foresee myself having a huge need for this, but if it becomes desirable, I'll keep that idea in mind, thanks.

@mewo2 I'm currently thinking of the individual passes as purely internal rewriting operations on whatever data structures happen to be convenient at that point in the pipeline. But if the whole novel-model becomes too much to hold in memory, I suppose I will have to think about reading and writing intermediate representations, yeah.

MichaelPaulukonis commented 8 years ago

I note that many of the classic Narrative generators generated their world + stories, and had another independent system that "translated" them into more natural language. For example, TALE-SPIN:

2015-10-29 09_47_59-inside computer understanding_ five programs plus miniatures - r c schank c (source)

It's a lot more complicated than this, but I can't find back an example/citation right now. Multiple sentences about the current world-state would be combined (JOE WAS IN THE CAVE. JOE KNEW HE WAS IN THE CAVE. THE CAVE WAS DARK. THE CAVE HAD AN EXIT. JOE KNEW THE CAVE WAS DARK. JOE WANTED TO BE IN THE LIGHT. JOE KNEW THE CAVE HAD AN EXIT. => Joe wanted to get out of the cave and into the light.)

cpressey commented 8 years ago

@MichaelPaulukonis the 2nd version is more pleasant to read, but the 1st is just that much closer to 50,000 words, isn't it?

Participating, even reading all the issues for this year's edition, is clearly going to cut into what little time I already have. I'll keep these updates short and infrequent.

I suppose I have a goal number 4, which is: don't use any libraries or corpuses or APIs except the bare minimum. Well, that's not a goal so much as predilection. I enjoy writing code. I don't enjoy learning and futzing with the idiosyncrazies of Yet Another Dependency. But this gives you an indication of what the final result will be like here.

I'm not planing on releasing any previews or code until the end, or at least until the result reaches a certain minimum quality (but see I don't expect that to happen in November so, like, until the end.)

MichaelPaulukonis commented 8 years ago

I'm not planing on releasing any previews or code until the end

DRAT! There goes my plan!

As usual, I'm hoping to play with a bunch of different dependencies, and then see if anything sticks. Each to our own.

cpressey commented 8 years ago

Update: it generates a story. It is terrible. I do hope Goal 1 didn't get anyone's hopes up. I did call it "unrealistic" and "incredibly unrealistic" in almost immediate succession...

Actually, suppose we reframe Goal 1 slightly, with gradation instead of as a yes-or-no thing. How many words of the average NaNoGenMo text is the average reader willing to read, on average, before they give up? By "read" I of course mean, try to make sense of the words, not just look at them.

For texts that are complete word salad, the number is probably well below 100. (and then you start skimming forward, maybe, looking for interesting nonsense.) For others, maybe higher. A couple of hundred, at a guess. Hard to say, without going to the ridiculous length of actually conducting experiments on it.

Anecdotes welcome, though!

enkiv2 commented 8 years ago

One of the reasons I did generative erotica is that people will, on average, be more entertained with less coherent erotica -- the subject matter is either purient or funny. As a result, a fairly simple and low-quality grammar produces a result that I was willing to read several pages of. I suspect that there are other tricks with regard to style or subject matter that work similarly to increase the readability of content irrespective of novelty or quality. (For instance, vague yet evocative sentences like those used in Monfort's 1k generators would be great if you could consistently generate them!)

On Wed, Nov 4, 2015 at 9:06 AM Chris Pressey notifications@github.com wrote:

Update: it generates a story. It is terrible. I do hope Goal 1 didn't get anyone's hopes up. I did call it "unrealistic" and "incredibly unrealistic" in almost immediate succession...

Actually, suppose we reframe Goal 1 slightly, with gradation instead of as a yes-or-no thing. How many words of the average NaNoGenMo text is the average reader willing to read, on average, before they give up? By "read" I of course mean, try to make sense of the words, not just look at them.

For texts that are complete word salad, the number is probably well below

  1. (and then you start skimming forward, maybe, looking for interesting nonsense.) For others, maybe higher. A couple of hundred, at a guess. Hard to say, without going to the ridiculous length of actually conducting experiments on it.

Anecdotes welcome, though!

— Reply to this email directly or view it on GitHub https://github.com/dariusk/NaNoGenMo-2015/issues/11#issuecomment-153729525 .

cpressey commented 8 years ago

Several pages, meaning, what, about 1500 words?

enkiv2 commented 8 years ago

Yeah, something like that. (I have a fairly high tolerance for this stuff, though. You can analyse it for yourself: https://github.com/enkiv2/NaNoGenMo-2015/blob/master/orgasmotron.md )

On Wed, Nov 4, 2015 at 9:15 AM Chris Pressey notifications@github.com wrote:

Several pages, meaning, what, about 1500 words?

— Reply to this email directly or view it on GitHub https://github.com/dariusk/NaNoGenMo-2015/issues/11#issuecomment-153732986 .

tra38 commented 8 years ago

For non-simulations, my guess is that the attention span starts drifting at '3*templates', where templates are the number of words within the template in question. It's enough for the user to gets bored because he grasps the pattern. So if you get a template that is 500 words in length, then that would probably make your 1500 words.

(and then you start skimming forward, maybe, looking for interesting nonsense.)

It seems bots are excellent at generating text, but it's the humans who are trying to shift through the resulting nonsense to find actual meaning and worth. There has to be a mathematical formula that can be used to measure the 'fitness' of a text, allowing the bots to engage in filtering for how interesting* it is. This way, you can have the bots generate a bunch of words and then engage in automatic curation.

*We can define 'interesting' perhaps by sentiment analysis or how well it matches one of Vonnegut's plot curves. Or maybe, pull in machine learning. You rate a passage the computer generates on a scale of 1-10, and with enough data, eventually the computer will find a pattern.

hugovk commented 8 years ago

There has to be a mathematical formula that can be used to measure the 'fitness' of a text, allowing the bots to engage in filtering for how interesting* it is.

Brings to mind genetic algorithms. Has anyone tried that approach?

enkiv2 commented 8 years ago

With regard to the 'mathematical formula', I suspect you could use Shannon's information entropy formula with the prior of the reader's mind :P. After all, humans accept only a narrow band of novelty, and what counts as novel depends upon what the reader has seen before.

On Wed, Nov 4, 2015 at 9:17 AM Tariq Ali notifications@github.com wrote:

For non-simulations, my guess is that the attention span starts drifting at '3*templates', where templates are the number of words within the template in question. It's enough for the user to gets bored because he grasps the pattern. So if you get a template that is 500 words in length, then that would probably make your 1500 words.

(and then you start skimming forward, maybe, looking for interesting nonsense.)

It seems bots are excellent at generating text, but it's the humans who are trying to shift through the resulting nonsense to find actual meaning and worth. There has to be a mathematical formula that can be used to measure the 'fitness' of a text, allowing the bots to engage in filtering for how interesting it is. This way, you can have the bots generate a bunch of words and then engage in automatic curation.

— Reply to this email directly or view it on GitHub https://github.com/dariusk/NaNoGenMo-2015/issues/11#issuecomment-153734075 .

ikarth commented 8 years ago

Humans figuring out patterns seems to be part of the interestingness metric. It seems to work on multiple scales: Groking the central conceit in Aggressive Passive or Redwreath and Goldstar Have Traveled to Deathsgate takes a few minutes at most, which will give you the sense of the overall plot without reading it. (And then figuring out the puzzle of which question goes with which answer can take a lifetime.)

Something like #72 or Alice's Adventures in the Whale takes a bit longer, because once you've grasped the pattern, the pleasure is in seeing the changes that were made in a familiar text.

I suspect that simulations play by slightly different rules. Dwarf Fortress has certainly generated a lot of stories, though I'm not sure how many of them are interesting precisely because they were interactive. (Not to mention, most renditions are a retelling of the events, rather than a direct output.) I'm going to be watching this year's simulation results with interest.

One pleasure that most generative works lack is a sense that an author intended them to happen this way. Not that you can't get a degree of intention-sense. I suspect that's why high-concept things like Aggressive Passive work so well: we can read the higher authorial intent, and that makes it easier to get closure and catharsis.

ikarth commented 8 years ago

@hugovk @enkiv2 Maybe generate a large corpus of generated results (with the metadata for the generator settings), use a crowdsourced interestingness vote, and then use that as the fitness criteria for an RNN?

enkiv2 commented 8 years ago

Honestly, I'd love to do that (and there are some similar systems out there). But, it sort of requires exposure (or mechanical turk!), so it might be hard to do this at novel length in a month, since it requires some large number of people to read more than a novel-length quantity of generated content in a month.

(I mean, you could do this with a markov chain rather than an RNN too. Do monte carlo and generate a handful of 'next sentences' and have people vote on which one is best, then feed the winner back in as training data or increment its connections.)

On Wed, Nov 4, 2015 at 9:47 AM Isaac Karth notifications@github.com wrote:

@hugovk https://github.com/hugovk @enkiv2 https://github.com/enkiv2 Maybe generate a large corpus of generated results (with the metadata for the generator settings), use a crowdsourced interestingness vote, and then use that as the fitness criteria for an RNN?

— Reply to this email directly or view it on GitHub https://github.com/dariusk/NaNoGenMo-2015/issues/11#issuecomment-153749253 .

MichaelPaulukonis commented 8 years ago

@hugovk - didn't we talk about this in the GenerativeText list? The trouble is the fitness algorithm -- if you've got one, well -- you've solved the problem. Otherwise, we're talking about using human readers via Amazon's Mechanical Turk or something.

Nothing that we little people could handle (:money:), but maybe in a few years somebody can think of a sneaky way to grab eyeballs with something like ReCaptcha, or Facebook will heave its vast and labyrinthine bulk in its direction.

@enkiv2 - For similar reasons new directors often work with low-budget horror movies. Witness Sam Raimi - he did Evil Dead not for any particular love of the genre, but for most-likely return on investment (time+money) (a source). The audience tends to eat it up no matter how low the quality. Witness the large numbers of self-published zombie books on Amazon. Or romance novels of any of the vast, arcane genotypes of romance.

So - generated horror fiction. SplatterGenPunk. Note to self -- add this to #14

enkiv2 commented 8 years ago

It's not as though there aren't people who will do that for free (see crowdsound, darwintunes, basically every quote DB, and that one project where a novelist is having randoms vote on his plot elements, along with twitch plays anything). But, getting that audience isn't guaranteed and it takes a while. If we didn't care about November, we could start a thing like that and then just let people discover and play with it as they will.

On Wed, Nov 4, 2015 at 9:53 AM Michael Paulukonis notifications@github.com wrote:

@hugovk https://github.com/hugovk - didn't we talk about this in the GenerativeText list? The trouble is the fitness algorithm -- if you've got one, well -- you've solved the problem. Otherwise, we're talking about using human readers via Amazon's Mechanical Turk or something.

Nothing that we little people could handle (:money:), but maybe in a few years somebody can think of a sneaky way to grab eyeballs with something like ReCaptcha, or Facebook will heave its vast and labyrinthine bulk in its direction.

— Reply to this email directly or view it on GitHub https://github.com/dariusk/NaNoGenMo-2015/issues/11#issuecomment-153751729 .

ikarth commented 8 years ago

Alice in Wonderland and Zombies?

MichaelPaulukonis commented 8 years ago

@ikarth - I first got this idea while brain-storming ways of getting texts from an extended Swallows of Summer engine. More people, more locations, flock/avoidance algorithms, object transference (in this case, the plague vector, whatever it is), focused event-replay as pseudo-plot, etc.

(Lightly Edited) Extracts from some emails:

And what would happen to Swallows with sufficient complexity to allow for emergent behavior?

A larger environment, more peoplem, and more behaviors for the people, including flocking, avoidance, eat sleep, etc.

We could easily go to a zombie scenario -- some sort of infectious behavior, infected flock, uninfected flock, uninfected avoid, etc etc etc. But need for food increases probability of movement, so...

I'm curious as to what flocking etc. could do in a narrative situation -- BORING, probably. But with other behaviors, and flocking not set to a herd-level, clusters of people, cliques, mean-girls, etc etc. competitiveness, objects, things.

"zombie plague" was just an example of transferrence. In "Tale-Spin" "Gravity" was an invisible character who pulled people down (until Gravity ended up drowning in a number of stories) or pushed people down. "Infection" would be given, without being lost as an item, from one character to another. Could be a cold, flu virus, ear-worm pop-song, or whatever. Tribbles.

If the system just played everything out [generating events, not text], and then AT THE END - where end is defined as all people have been infected - and took the events and narrated them from the standpoint of the last uninfected character, you'd have a ... I dunno, not a tragedy or thriller, but it would appear to have a plot.

And that's from basic rules of transfer/flock/avoid - no added narrative rules to direct plot.

That intrigues me.

enkiv2 commented 8 years ago

Simulation is definitely a way to produce 'plot' in the sense of events that follow each other with internal consistency, causality, and logic, but it doesn't in any way filter for what would be interesting to a reader. Most things that happen in reality (or any procedural simulation of a subset of reality) is not very interesting from the perspective of a reader of fiction, even things that are potentially very interesting to watch (it's one thing to watch a kung-fu movie, but reading a pokemon-style transcript of the same film's fight scenes would be incredibly boring).

My position is that an interesting style is necessary to make a simulation readable, and that furthermore, accurate simulation does not necessarily add to readability when an interesting style is already involved. (Lots of very interesting books have very dull plots, and lots of very dull books have very interesting plots; wonderful books are not necessarily internally consistent, and some famous books have a connection to any kind of physical reality that is tenuous at best and instead swim quite deeply in wordplay and aggressive subjectivity.) If you need style either way, a focus on style makes sense.

On Wed, Nov 4, 2015 at 10:14 AM Michael Paulukonis notifications@github.com wrote:

@ikarth https://github.com/ikarth - I first got this idea while brain-storming ways of getting texts from an extended Swallows of Summer engine. More people, more locations, flock/avoidance algorithms, object transference (in this case, the plague vector, whatever it is), focused event-replay as pseudo-plot, etc.

(Lightly Edited) Extracts from some emails:

And what would happen to Swallows with sufficient complexity to allow for emergent behavior?

A larger environment, more peoplem, and more behaviors for the people, including flocking, avoidance, eat sleep, etc.

We could easily go to a zombie scenario -- some sort of infectious behavior, infected flock, uninfected flock, uninfected avoid, etc etc etc. But need for food increases probability of movement, so...

I'm curious as to what flocking etc. could do in a narrative situation -- BORING, probably. But with other behaviors, and flocking not set to a herd-level, clusters of people, cliques, mean-girls, etc etc. competitiveness, objects, things.

"zombie plague" was just an example of transferrence. In "Tale-Spin" "Gravity" was an invisible character who pulled people down (until Gravity ended up drowning in a number of stories) or pushed people down. "Infection" would be given, without being lost as an item, from one character to another. Could be a cold, flu virus, ear-worm pop-song, or whatever. Tribbles.

If the system just played everything out [generating events, not text], and then AT THE END - where end is defined as all people have been infected

  • and took the events and narrated them from the standpoint of the last uninfected character, you'd have a ... I dunno, not a tragedy or thriller, but it would appear to have a plot.

And that's from basic rules of transfer/flock/avoid - no added narrative rules to direct plot.

That intrigues me.

— Reply to this email directly or view it on GitHub https://github.com/dariusk/NaNoGenMo-2015/issues/11#issuecomment-153757776 .

cpressey commented 8 years ago

I had forgotten about Aggressive Passive. I tried reading it again, and managed to read the first 2 chapters before it got distinctly samey. I could've forced myself to go on, maybe, but found the urge to skim pretty strong so I consciously decided to stop there. wc says the first 2 chapters comprise 1333 words.

So I guess, for this project, I'm aiming for around 1500 readable words, at minimum, although if I could get 2000 or more, that would make me very happy.

I'll probably generate two versions at the end: a short but readable version, and a 50K-word "NaNoGenMo version" in which, as I've mentioned before, the plot will probably begin to suffer partway through.

cpressey commented 8 years ago

On the subject of simulation: it's a fertile area, but it's a bit complicated how it interacts with narrative. There's this tension between the "hierarchical" plot structure and the "linear" action-reaction stream... No doubt a writer thinks about how a character will react to something happening, and how other characters will react to that reaction, etc. But a writer also thinks about how they want things to work out in the story, and they will often move characters towards a certain conclusion, to advance the plot. When they do it well, the reader hardly notices.

Which leads me to Goal 2...

Oh, darn, here I am posting frequently when I was just saying I wouldn't be posting frequently. Sorry. I'll just shut up now and try to get this trainwreck-generator really working.

MichaelPaulukonis commented 8 years ago

Since your rules are thus invalidated, please post code and works-in-progress.

MichaelPaulukonis commented 8 years ago

"interesting point-of-view" was one of the reasons I floated the "records all actions in the system, and replay from perspective of last survivor".

However, additional points-of-interest could be determined algorithmically -- situation where person is hiding and herd approaches, only to veer off when distracted by another character.

Will this be as good as a human? OF COURSE NOT. I'm just floating some ideas to extend it beyond "people reading for sex and/or violence".

dariusk commented 8 years ago

I would like to remind everyone that there is no requirement to post code until the very end of the month and there is no requirement at all to post works-in-progress.

@MichaelPaulukonis if that request was meant to be kind encouragement, I request you try and be kinder in the future.

cpressey commented 8 years ago

@dariusk No worries, I assume it was deadpan humour.

@MichaelPaulukonis My response can only be this.

cpressey commented 8 years ago

I might not be releasing code or previews YET but I'm happy to talk about what I'm doing, time permitting. In fact I'd really like to talk about Goal 2, since that is The Interesting Goal here. But first perhaps I should put Goal 3 to rest.

As I expected, there is certainly some similarity between this thing and The Swallows. I didn't use an external corpus for either of them, so they're both "written in my hand" so to speak. The event model is also not dissimilar, and they're both narrated in 3rd-person past tense...

But significantly, where The Swallows had basically only Brownian motion to work with, this writes around an actual plot. Conflicts get resolved and the story has an end and everything. There are other differences — the diction is sometimes better (or at least different) and the setting is wildly different and the internal architecture is a lot more intentionally-designed and compiler-like — but the plot thing should be enough.

So, yeah, I think I'm happy with Goal 3 at this point.

cpressey commented 8 years ago

@MichaelPaulukonis I do like the idea of generating more (much more) than you need and discarding most of it. Back in January I considered simulating a whole city, with each individual going about their daily life, going to work, planning crimes, etc., and then taking some kind of cross-section of that. Somehow.

The problem* is that identifying interesting situations is probably as hard or harder than fabricating them.

*not an actual problem, because this is NaNoGenMo and trying different things is what we do

hugovk commented 8 years ago

If you could somehow evaluate each person's day in the city, maybe find a baseline, average, boring day, then pick the most exciting one.

rngwrldngnr commented 8 years ago

It might simplify the problem to stack the deck and give everyone in the city hidden goals and dark secrets. Something along the lines of Neil Gaiman's City of Spies. The interest curve would probably be flatter, but instead of explicitly looking for the most interesting person, you could do queries about what kind of story you want, like: "Two people who never meet, but indirectly destroy each others lives", and let the huge number of people who could potentially be living lives that fit that plan work for you.

It could even be a pilot program to a more general, realistic city, since you would essentially only need to write one kind of person, to start.

enkiv2 commented 8 years ago

Exciting might be more difficult to estimate and less useful as a rubric for interestingness than unusual. If you simulate the whole city, you will get some overlap between behaviors, and so you can essentially eliminate from consideration any sequence of actions that is identical to another character's sequence and any sequence of actions that's identical to a previously described sequence.

On Fri, Nov 6, 2015 at 10:46 AM rngwrldngnr notifications@github.com wrote:

It might simplify the problem to stack the deck and give everyone in the city hidden goals and dark secrets. Something along the lines of Neil Gaiman's City of Spies. The interest curve would probably be flatter, but instead of explicitly looking for the most interesting person, you could do queries about what kind of story you want, like: "Two people who never meet, but indirectly destroy each others lives", and let the huge number of people who could potentially be living lives that fit that plan work for you.

It could even be a pilot program to a more general, realistic city, since you would essentially only need to write one kind of person, to start.

— Reply to this email directly or view it on GitHub https://github.com/dariusk/NaNoGenMo-2015/issues/11#issuecomment-154443120 .

MichaelPaulukonis commented 8 years ago

The problem* is that identifying interesting situations is probably as hard or harder than fabricating them.

Well, if there is a central crime, that is a thread to follow.

Then, everybody else who is within n proximity of some part of the crime is another thread.

Different ways of combine -- first part of the story is the crime thread, beginning to end. Subsequent portions are those within proximity. Since people have (hopefully!) read the first part, the subsequent sections are notable for their similarities to the crime, etc. Even though the crime itself would never be mentioned (I'm specifically excluding people that are directly impacted by the crime/planning/etc. So - proximity := > 1 < n

enkiv2 commented 8 years ago

I dunno. Crime isn't always interesting, and in order to distinguish between criminal and noncriminal behavior you need to implement laws in the city and produce complex incentive structures surrounding them that make characters mostly follow the law, break the law occasionally, and break the law in varied and inconsistent ways. In other words, you'd need to program your simulation in order to ensure that criminal activity is narratively interesting.

Meanwhile, if you use unusual behaivor (i.e., behavior rare in the statistical sample of the whole city) then you will pick up stories about bugs in your simulation, along with stories wherein sequences of highly unusual things happen. Strange coincidence stories will be generated, along with stories about characters forced by circumstance into unusual sets of actions. (And, if someone inside your simulation invents a simulation, you'll get the script of World on a Wire in your simula-3 prototype ;-)

On Fri, Nov 6, 2015 at 12:53 PM Michael Paulukonis notifications@github.com wrote:

The problem* is that identifying interesting situations is probably as hard or harder than fabricating them.

Well, if there is a central crime, that is a thread to follow.

Then, everybody else who is within n proximity of some part of the crime is another thread.

Different ways of combine -- first part of the story is the crime thread, beginning to end. Subsequent portions are those within proximity. Since people have (hopefully!) read the first part, the subsequent sections are notable for their similarities to the crime, etc. Even though the crime itself would never be mentioned (I'm specifically excluding people that are directly impacted by the crime/planning/etc. So - proximity := > 1 < n

— Reply to this email directly or view it on GitHub https://github.com/dariusk/NaNoGenMo-2015/issues/11#issuecomment-154484523 .

cpressey commented 8 years ago

These are all interesting angles to try. I literally just thought of another one. There's a nursery rhyme that goes:

Half a pound of tuppenny rice, Half a pound of treacle, That’s the way the money goes, Pop goes the weasel! Up and down the city road, In and out the Eagle, That’s the way the money goes, Pop goes the weasel!

In North America is this is of course known as "Pop goes the weasel" as has mostly-different lyrics, but in the UK it's known as "Half a pound of tuppenny rice".

One interpretation of this is that it's recounting the journey of the two-penny piece (or, more generally, cash) as it changes hands between people in the town. Someone spends it on rice at the store, then the storekeeper spends it at the pub (the Eagle), and so on.

You could of course apply this to this city-simulation. The individuals carry money, and they spend it (or drop it or have it stolen)... and you write the story by choosing a coin and following the path it takes. (In practice you'd probably try many coins and pick the one that has the most interesting story. So it doesn't remove that hurdle of recognizing an interesting story, but there's a possibility it would simpler in the case of a coin... merely counting the number of times it changes hands would be a good start.)

I think this is a fun idea, and if anyone wants to try this, please go ahead and do so!

tra38 commented 8 years ago

Creating a simulation of an entire city just to trace the history of a coin seems absurd. This sounds like a task where a state machine would be better suited. Bob randomly picks from a list where to spend his money, picks "buy rice from the shopkeeper", and then the shopkeeper then consults his list to find out where to spend the money and decides to spend the dollar at the pub. Then the person who owns the pub checks his list, and decides to spend it on Bob's rice-cakes. No need for simulation, just picking randomly from a shopping list (and maybe deleting from the random list when the character spends his money: Bob doesn't need to buy rice again).

EDIT: Consider also the possibility that a simulation is useful only for generating the data that can then be transformed into text. It can be possible to just generate the data outright, without needing a simulation to slowly make the data.

PPKFS commented 8 years ago

The issue I'm currently running into wrt simulation: it's a scarily deep rabbit hole. People should desire things like food. People should be, then, able to know whether they have food. If not, they should be able to find or buy food. Then they need to be able to find, and purchase. They need to know how to sell, and what they're allowed to take or not take. They need to know whether something is available for them to buy. Where does it stop?

Then you also need to make the people who sell, where they sell, where the items are, what things are worth, etc. I've just spent about 3 days doing things entirely unrelated (it seems) to having a story.

Also I've never heard that nursery rhyme called 'half a pound of tuppenny rice'. Odd.

ikarth commented 8 years ago

There are, I think, at least two separate challenges in simulating a novel, which we've kind of been dancing around in this discussion. The first is that making the simulation itself challenging is tricky byitself. In theory, emergent complexity should make a sufficiently broad simulation deep because of the combinatorial complexity, but that requires quite a lot of content and it can be hard to judge which content actually contributes. Second, the simulation gets you a plot, and even possibly some stage business and details but that's mostly confined to the fabula. The syuzhet and rendition of the text are another matter.

It occurs to me that it might be easiest to present such a simulation in a relatively avant guarde or hypertextual approach, like the procedurally generated newspapers of small towns by katierosepipkin.

The more textual approaches such as markov chains, word2vec and the like are, in contrast, heavy on the style but have no conception of plot. And at this point I am not sure which is harder for a machine. I do think the most interesting approaches, to me at least, combine them.