This is a Python script that web-scrapes the serial fictions of Wildbow (J.C. McCrae) and bundles them into local .txt
files for offline / e-reader viewing.
Navigate to the downloaded directory (e.g. ~/Downloads/WildbowCrawler/
) in your terminal, then input the command
python main.py <storyname> <format> <arcsep> <chapsep>
with arguments
<storyname>
-- This is the name of the story to be locally archived. Presently, this should be one of
worm
pact
twig
<format>
-- This is the keyword for which file structure the results will be placed in. Select one of
single
-- this will create the <storyname>
directory, containing <storyname>.txt
with the full text of the story.per-arc
-- this will create the <storyname>
directory, containing one <arcnumber>_<arcname>.txt
file for each arc (e.g. 1_Gestation.txt
).<arcsep>
and <chapsep>
-- These are the strings inserted at the beginning of each arc and chapter, respectively, for CTRL+F purposes. Certain characters need to be escaped by quotes, as in '#A'
. Choose ''
for no separator.Some example usages follow.
python main.py worm per-arc [ARC] [CHAPTER]
python main.py pact single '#A' '#C'
python main.py twig per-arc '' Chapter:
This is the first stable version of WildbowCrawler. It successfully passes through the entirety of Worm.
sys.argv
with the argparse
moduleContributions to and optimizations of this (small) project are welcome if you'd like. Kindly alert me if you notice any anomalies in transcription, including disordering, jumbling, skipping and the like -- no, I am not rereading Worm to write this crawler. Also, let me know if (probably unicode-related) bugs arise upon further Twig publication.
This code is written in Python 2.7.3. The non-standard library modules used are requests
(for HTML request handling) and bs4
(for HTML parsing).
Wildbow's personal blog
Worm
Pact
Twig
/r/parahumans (for discussion of all Wildbow's work)
Wildbow's Patreon
In my opinion, Wildbow is a very gifted writer. Support him if Worm gets picked up for publishing!
You can reach the author (Alex Ruble) most easily via GitHub (Calamitizer), email (jaruble@ncsu.edu), or Twitter (@aknifeallblade).
This software has no associated copyrights whatsoever (i.e. an unlicense). See LICENSE.txt
for the full description.