Project Gutenberg headers and footers remain in our test corpus

Right now, the quickstart guide (and a lot of our initial use-cases) uses texts grabbed from Project Gutenberg. To distribute these texts, we legally have to keep the headers and footers on the files.

We should strip the headers and footers out when performing any actual analysis, however - maybe when loading Documents? There's a function in gutenberg_loader that does this, but that file isn't in the master branch at the moment.

dhmit / gender_analysis

Project Gutenberg headers and footers remain in our test corpus #109