This repository contains chapters (randomly sampled) from novels. The novels themselves are randomly sampled (without replacement) from two populations:
The encoding procedure used is described in Prose Fiction Encoding Instructions.
common_library.csv
contains metadata about the Common Library titles.reprint_canon.csv
contains metadata about the Reprint Canon titles.other_novels.csv
contains metadata about texts included in the repository incidentally. These are not part of either Canon. These novels survive but there are no page scans of the first edition as of the end of 2018.scripts/quality_control_checks.py
checks to see if a text has been properly encoded.supplementary-materials
contains data used to construct the random samples.texts
contains chapters. Filenames begin with ATCL title ids. Texts are encoded using HTML5.adding-an-encoded-novel.md
describes how to add a novel to this repository.‘…’
and “…”
) are not always entered correctly. Analyses should not require distinguishing between, say, “
and ”
. Counts of puncutation marks per sentence or paragraph, however, should be reliable.All texts in this repository are in the public domain.
(in order of joining)