JonathanReeve / chapterize

A simple tool for splitting up an ebook into its chapters. Works well with Project Gutenberg texts. May also be used to clean up books for computational text analysis.
GNU General Public License v3.0
96 stars 21 forks source link

Parse short stories #3

Open nateGeorge opened 7 years ago

nateGeorge commented 7 years ago

It would be nice if this could parse short stories, like this: http://www.gutenberg.org/cache/epub/25519/pg25519.txt

Possibly detecting a 'contents' section and getting the titles from there would work, at least for that example.

JonathanReeve commented 7 years ago

Really good idea. I've thought about using tables of contents for helping to infer chapter divisions, too. Not sure how that would work, exactly, but it's a neat idea. Feel free to give it a try and submit a PR.

On Nov 1, 2017 16:09, "Nate George" notifications@github.com wrote:

It would be nice if this could parse short stories, like this: http://www.gutenberg.org/cache/epub/25519/pg25519.txt

Possibly detecting a 'contents' section and getting the titles from there would work, at least for that example.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/JonathanReeve/chapterize/issues/3, or mute the thread https://github.com/notifications/unsubscribe-auth/ABwh3Cibp23QPsjVg4Lk8Ch7laG7BUMYks5syM_ngaJpZM4QOwAq .