Open hugovk opened 9 years ago
o wow o
That works so well. The chapter titles are very code-y. They explicate. But perhaps they should be footnotes? They distract from Le Plaisir du Texte.
Yeah, perhaps the chapter titles could be relegated elsewhere.
I had thought about stripping out the code-y bits from the title (e.g. ^[^\w]*If\b
-> If
), but then realised I'd be writing a regular expression to process regular expressions...
I'd been tweaking the number of random sentences to pick to end up with ~50k words, but then on this particular run the penultimate sentence landed on a 13k-word sentence from Joyce's Ulysses!
BWAH HAH HAH HAH HAH!
We owe much of the 20th century to Mr. Joyce.
-Michael Paulukonis http://www.xradiograph.com http://goog_2112721603Interference Patterns (a blog) http://www.xradiograph.com%5Cinterference @XraysMonaLisa https://twitter.com/XraysMonaLisa http://michaelpaulukonis.com http://www.BestAndroidResources.com
Sent from somewhere in the Cloud (hearthrug, by the fender)
On Tue, Nov 25, 2014 at 9:38 AM, Hugo notifications@github.com wrote:
Yeah, perhaps the chapter titles could be relegated elsewhere.
I had thought about stripping out the code-y bits from the title (e.g. ^[^\w]*If\b -> If), but then realised I'd be writing a regular expression to process regular expressions...
I'd been tweaking the number of random sentences to pick to end up with ~50k words, but then on this particular run the penultimate sentence landed on a 13k-word sentence from Joyce's Ulysses!
— Reply to this email directly or view it on GitHub https://github.com/dariusk/NaNoGenMo-2014/issues/116#issuecomment-64408181 .
A refactoring bug just resulted in a book of 19,123,315 words instead of ~50k.
Here's a 50,143-word second volume, with an added "enchant" chapter and the regexes relegated to an appendix:
I like the beginning:
Chapter 1
In training he had been, once upon a time, an engineer and built dams that broke and bridges that fell down and wharves that ftoated away in the spring floods.
“There is moonlight.”
“Look at the moonlight.”
Unfortunately it was bright moonlight.
Here's an article about this and #50: http://www.theatlantic.com/technology/archive/2014/12/moby-dick-in-50000-meows-and-other-tales-that-computers-tell/383340/
Gutengrep and Gutenstory
Repo: https://github.com/hugovk/gutengrep
Gutengrep poetry generator
Riffing on a suggestion made in #55, I wrote a script to grep full sentences using regexes from the Project Gutenberg CD. It uses NTLK to find full sentences rather than the arbitrary lines in a file that grep finds. It can also sort them by shortest sentence first.
But first.
The OED has a word of the day email, and the quotations for "moonlit" struck me as particularly poetic:
Let's try this on Project Gutenberg. There are 597 text files on the CD containing 3,583,390 sentences. Full output of these can be found in the repo.
...
Let's search for "once upon a time":
...
Or "And then" at the start of each sentence (regex:
[^\w]*And then
):...
...
Or "But why" at the start of each sentence (regex:
[^\w]*But why
):...
...
...
Not many, so here's the full thing:
To generate a full book, gutenstory.py repeatedly searches the 3,583,390 sentences in the 597 text files of the Project Gutenberg CD.
First it collects all the sentences containing "once upon a time". Next it collected all the sentences with "happily ever after" or ending "the end." Each chapter randomly begins and ends with one of these sentences.
After that, the remainder of each chapter's content is generated from 80 random sentences, sorted by length, of different sets of sentences. For example, one chapter of those beginning "But why". Another beginning "Of course", others starting "Suddenly" or "Presently", and yet more containing "year-old", "princess", "violin", "laughed", the months or days.
Here's example output of a 65,383-worder: HTML | PDF | MD
Generated with:
Then print to PDF using Chrome. Big thanks to @moonmilk for the CSS: