Right now, the quickstart guide (and a lot of our initial use-cases) uses texts grabbed from Project Gutenberg. To distribute these texts, we legally have to keep the headers and footers on the files.
We should strip the headers and footers out when performing any actual analysis, however - maybe when loading Documents? There's a function in gutenberg_loader that does this, but that file isn't in the master branch at the moment.
Right now, the quickstart guide (and a lot of our initial use-cases) uses texts grabbed from Project Gutenberg. To distribute these texts, we legally have to keep the headers and footers on the files.
We should strip the headers and footers out when performing any actual analysis, however - maybe when loading
Documents
? There's a function in gutenberg_loader that does this, but that file isn't in the master branch at the moment.