NaNoGenMo / 2017

National Novel Generation Month, 2017 edition.
https://nanogenmo.github.io
185 stars 7 forks source link

Skynet's ██████ #5

Open tra38 opened 7 years ago

tra38 commented 7 years ago

... [N]ew directors often work with low-budget horror movies. Witness Sam Raimi - he did Evil Dead not for any particular love of the genre, but for most-likely return on investment (time+money) (a source). The audience tends to eat it up no matter how low the quality. Witness the large numbers of self-published zombie books on Amazon. Or romance novels of any of the vast, arcane genotypes of romance. ---Micheal Paulukonis, Nov. 4th, 2015, GitHub comment

Generally, as the corpus grows larger, the generated output becomes more varied and interesting. This means that the best way to improve a text generator is to simply gather more data. Of course, the process of gathering data is itself an expensive endeavour, forcing programmers to be more creative in how they use the corpus they already have.---Tariq Ali, Dec. 23rd 2016, "NaNoGenMo 2016 and My Predictions About Text Generation

In the past, I have attempted to handwrite several science-fiction stories (most of them being glorified Terminator fanfic). Unfortunately, they are all half-finished. Fortunately, I still own copyright to them. Therefore, I plan on combining all my stories under an overarching "pulp sci-fi" storyline about Skynet. If Michael is right, referring to such an iconic pulp icon would immediately attract attention to the generated novel. In addition, since I had handwritten most of the corpus, I don't need to worry about the text being seen as "too ancient". I may have to worry about the coherence of the text though.

I do not have 50,000 words of corpus lying around for me to reuse to my heart's content. So what I plan on doing is to extend the corpus by adding █ to the text I already have. The fluff justification here is that you're reading censored data about the world that Skynet had gathered. I'm not sure how much █ I can get away with using though, since having entire pages filled with █ will be boring to consume. Still, it's worth a shot.

EDIT: While "probable obsolesce and extinction at the hands of a potentially super-intelligent supercomputer" can be interpreted as a type of psychological horror story, it may be better to take Michael's words literally and ramp up the "pulp" nature of this sci-fi novel. So in addition to fighting Resistance agents, Skynet will also battle against aliens and zombies.

tra38 commented 7 years ago

It turns out that I won't get time to work on this project...at least for right now. Although I'm sure that my approach would likely be successful, I'm willing to let someone else "steal" this idea. And, of course, I can always be a NaNoGenMo rebel and work on this idea on a month that isn't November.

I'm closing this issue.

tra38 commented 6 years ago

Okay, so I am working on this project, if only to get it out of my mind and validate whether this approach actually works.

So I finally got a MVP novel done with some dummy Lorem Ipsum corpus. Here's the source code, and the novel text (wc says that the novel has 50,105 words). Output is actually looking slightly better than I expected, despite the nature of the test corpus. It should be fairly easy to swap out the test corpus with the "real" data.

As an interesting aside, I am using machine learning for this project (ZombieWriter), since this may be a more scalable approach to combining multiple different corpuses together without me needing to bother with manually tagging each paragraph I wrote before I feed it into the Track Method or some other paragraph-shuffling algorithm. I'm not sure how much scalable it will actually be though...and obviously the program itself will run for a longer period of time.

Fun Fact - I originally attempted to upload my novel onto GitHub Gists, only for GitHub to not actually allow me to open my novel because it's "taking too long to load". This seems very odd, and I need to investigate why this type of novel seems pretty huge when I'm able to upload other 50K novels onto GitHub Gists without any difficulty.

tra38 commented 6 years ago

Time to declare the project complete.

I was only able to merge 2 1/8 of my Skynet Terminator stories before getting tired of copying and pasting text into the CSV file. It turns out merging a bunch of different stories together doesn't actually lead to coherent output (requiring me to write more hand-written text to justify the addition of previous hand-written text)...and I didn't actually have much material pre-written anyway (so I had to supplement it with some sci-fi speculation from across the Internet). You still got enough of your lovely pulp sci-fi, though the novel is probably more of a description of a sci-fi setting rather than a functioning story.

The results does seem readable, although the "censored bar" gimmick does get old fast. I also noticed that ZombieWriter (like most machine learning approaches) requires a lot of input data before the results become interesting to consume. Whether that is because ZombieWriter need the data to properly categorize the different paragraphs, or ZombieWriter needs the data to 'paper over' its faults by posting an evocative paragraph in lieu of something sensible...well, I'm not a machine learning expert, so I can't tell for sure.