Gutenberg Mashup - Githubissues

superMDguy commented 7 years ago

This is my second novel I'm planning to do. Last year, there was a discussion about how most novels are boring (see https://github.com/dariusk/NaNoGenMo-2015/issues/11). In response to this, cpressey created "A Time for Destiny: The Illustrious Career of Serenity Starlight Warhammer O'James during her First Three Years in the Space Fighters", using a story compiler approach. However, as he noted, the story is interesting for a bit, but then starts to get less interesting. Why?

I think that anything built using manually created templates inherently has this "boring point". No matter how many templates you write, your novel will eventually have a "boring point", where the novel feels repetitive. You start to sense the templates inside, and see where names where inserted in formulas. One could argue that the only way to defeat this would be to have a human mind behind the story. When you get to the bottom of it, though, I'd say the human mind really is just a collection of "templates", that creativity is merely rearranging these templates in interesting ways. Everything we do is influenced by things we've read and experienced. So, the answer is to create as many templates as the human mind has, or just as many as possible.

Enough with the theory. Here's what I plan to do:

Get as many Project Gutenberg books as possible.
Analyze the texts, trying to find named entities like "main character", "love interest", "principal setting", "goal (whether an action, state, or object". I'll probably do this using a combination of word counts, relations between objects, and POS tagging using spaCy or NLTK.
Create templates based on this analysis, for example inserting {character(main)} wherever "Sherlock Holmes" appears in a Arthur Conan Doyle novel.
Break up the novel into 'scenes'. This will be a really hard part. I'll probably start out by just using chapters, and make it more accurate later if I have time.
Pick random entities from the list derived from the PG ebooks, and assign them to their entity name, for example choosing "Elizabeth" as the main character.
Create a plot with a plot generator.
Use the scenes with filled in entities to generate the text for the plot.

This might be overambitious (it definitely is), but I'll just see how far I can get. Let's see how this works.

ikarth commented 7 years ago

I'm very interested in this, if only because I'm attempting something vaguely similar as I put together the low-level text output for my project. (I'm trying out a sentence-level approach at the moment, we'll see what it yields.)

superMDguy commented 7 years ago

Cool. I'm excited to see how it turns out.

tra38 commented 7 years ago

I think that anything built using manually created templates inherently has this "boring point". No matter how many templates you write, your novel will eventually have a "boring point", where the novel feels repetitive. You start to sense the templates inside, and see where names where inserted in formulas.

This is an unsolved problem in human creativity too. We can detect patterns even in human-generated content, and get tired of them (see criticism of blockbuster movies, Disney movies, road trip movies, listicles, the Hero's Journey, etc., etc., etc.). We continually search for novelty after we exhaust the potential of existing content. Today's lovers of literature do not hole themselves up in the local library or stay plugged into the Internet forever. At some point, boredom sets in.

So rather than worry about the ability for humans to detect patterns, worry about scaling up the literature to meet your end goals. If the content is designed to entertain a human for 10 hours, and the human gets bored 11 hours in, then the content has succeeded. If a novel is designed to be "human-readable" for 50,000 words, then it's fine if the 50,001st word gets repetitive.

This might be overambitious (it definitely is), but I'll just see how far I can get.

Good luck. My suspicion is that it'd probably either involve preparing the templates ahead of time before NaNoGenMo, or relying on crowd-sourcing assistance to tag as many paragraphs as possible. I'd definitely think it'd work though, and I'd like to see how automation can help you "scale" up the task.

superMDguy commented 7 years ago

Thanks, I was trying to portray that idea, but you made it much clearer.

By the way, I set up a (mostly empty) repo at https://github.com/superMDguy/GutenbergMashup. All it has is a jupyter notebook with a function that gets the top n characters in a novel using spaCy.

NaNoGenMo / 2016

Gutenberg Mashup #92