I'll give it a bash. I'm thinking of treating it like landscape generation in Minecraft, starting with the high-level features and then dividing and conquering until you get down to the sentence level, which would basically be sentence templates to be fleshed out with some kind of Markovian randomness.
I've done some coding; here's where things are at so far.
I'm dividing the problem of writing a novel in two. The first is coming up with a skeleton of the story. The second is generating convincing sentences to flesh out this skeleton.
This story skeleton is created by generating a certain number of chapters at random, each of which serves a specific goal (stasis, trigger, quest, surprise, choice, climax, reversal, resolution). Each chapter contains a cast of character roles (protagonist, love interest), a primary location and so forth.
A sequence of paragraphs is then generated for a particular chapter. These also serve a goal (introduction, conclusion, dialogue, action, description), and contain a sub-set of the chapters cast.
Finally, a sequence of sentence templates is generated for each paragraph. These contain the objects involved in the sentence. Such as "protagonist attacks antagonist" or "love says protagonist". The templates are chosen to suit the available cast, location, paragraph goal and chapter goal.
Once the skeleton is available, we need to re-write each sentence as a convincing line of text in a novel. We do this by training statistical language models on a bunch of books. We then use these models to do all of the generation.
First, proper names are found, and randomly assigned to the various character roles. This is done similarly for locations. Lists of synonyms are derived from the training data for the various words used in the templates (love, says and so on). This allows a particular template to be re-written as a sequence of keywords that must appear in the generated sentence (such as "HOLMES HITS MORIARTY" or "PASSION WHISPERS HOLMES").
Next a list of secondary keywords is maintained. This is to try to ensure that each sentence flows on from previous sentences. This is derived from the training data by looking at the mutual information between words in adjacent sentences, so that keywords that should appear in the next sentence can be calculated from the sentence that was just generated.
Finally, a goal-oriented markov model is used to generate a graph of possible sentences, ensuring that all of the keywords in the template appear in the generated sentence. All possible paths through this graph are scored based on various measures (information theoretic and heuristic) to decide on the final generation, which is then emitted.
So that's where I'm going with this; I have most of the story skeleton finished and have made a start on constrained markovian generation. Funtimes!
I'll give it a bash. I'm thinking of treating it like landscape generation in Minecraft, starting with the high-level features and then dividing and conquering until you get down to the sentence level, which would basically be sentence templates to be fleshed out with some kind of Markovian randomness.