NaNoGenMo / 2019

National Novel Generation Month, 2019 edition.
97 stars 5 forks source link

Moby Dick by Herman Melville and 346 Others #126

Closed hugovk closed 4 years ago

hugovk commented 4 years ago

An entry for NaNoGemMo ("write code that writes a novel") 2019.

https://github.com/hugovk/NaNogenMo/2019 https://github.com/NaNoGenMo/2019/issues/34

Go through a book, eg. Moby Dick, check each word in turn, find another work using that word, and add that work as a citation for that single word. "Moby Dick, as written by [list of cited authors]".

Prep: Download Project Gutenberg's August 2003 CD archive ("contains 600 of our best Ebooks") https://www.gutenberg.org/wiki/Gutenberg:The_CD_and_DVD_Project

Extract into PG2003-08, so that:

PG2003-08/master_list.csv contains the metadata (use the fixed version in this repo)
PG2003-08/etext00/ these contain txt files
PG2003-08/etext01/ "
PG2003-08/etextXX/ "

Move the main work into the root:

mv PG2003-08/etext01/moby11.txt .

And manually delete the PG boilerplate from beginning and end of moby11.txt (use the one in this repo)

Run with just three input books:

python citifier.py --number 3

Run with all found books and cache the processed co-author books (On my machine: ~60 mins first run, ~20 mins subsequent runs):

python citifier.py --cache

Create a PDF:

brew cask install wkhtmltopdf
wkhtmltopdf output.html output.pdf
hugovk commented 4 years ago

Oops, meant to PR into my fork, not this repo! https://github.com/hugovk/NaNoGenMo-2019/pull/1