domenic / worm-scraper

Scrapes the web serial Worm, its sequel Ward, and the bridge series Glow-worm into an ebook format
Other
210 stars 48 forks source link

Improve presentation of chapter titles and arcs #54

Open domenic opened 2 weeks ago

domenic commented 2 weeks ago

Vision: instead of

We have:

Questions/problems:

wfdewith commented 2 weeks ago

From a technical perspective, since the latest release of the scraper now generates EPUB 3 files, consider creating a NCX based Table of Contents and don't include it as a separate page. NCX ToCs can be hierarchical. In my opinion, it is the the e-reader that should be responsible for rendering the ToC and it should not be necessary to include it in a specific spot in a e-book.

I do like this proposal, because it allows me to collapse arcs in the ToC, which makes it way less unwieldy when navigating.

I have no strong opinions on your other questions, because I just started reading Worm. Still, I did notice that the interlude titles are fairly inconsistent. For example, bonus interludes are sometimes titled "Interlude x (Bonus)" and sometimes "Interlude x (Bonus Interlude)". Are you planning on editorializing the chapter titles themselves as well?

domenic commented 1 week ago

From a technical perspective, since the latest release of the scraper now generates EPUB 3 files, consider creating a NCX based Table of Contents and don't include it as a separate page. NCX ToCs can be hierarchical. In my opinion, it is the the e-reader that should be responsible for rendering the ToC and it should not be necessary to include it in a specific spot in a e-book.

My understanding is that NCX is EPUB 2 format, and the HTML TOC files we're generating now are EPUB 3. HTML TOCs can also be hierarchical.

The HTML TOC file is not included in the spine currently, as you suggest. (Whether it is included, or not, appears to be up to the publisher. I agree with you it seems better not to include it.)

Still, I did notice that the interlude titles are fairly inconsistent. For example, bonus interludes are sometimes titled "Interlude x (Bonus)" and sometimes "Interlude x (Bonus Interlude)".

Yes, this bugs me too.

Are you planning on editorializing the chapter titles themselves as well?

I think my plan is as follows. There will be an option --chapter-titles= which can be either original, simplified, character-names:

I think the default will be simplified.

wfdewith commented 1 week ago

My understanding is that NCX is EPUB 2 format, and the HTML TOC files we're generating now are EPUB 3. HTML TOCs can also be hierarchical.

The HTML TOC file is not included in the spine currently, as you suggest. (Whether it is included, or not, appears to be up to the publisher. I agree with you it seems better not to include it.)

You are completely correct on both fronts. I checked the EPUB 3 spec to make sure. I don't know what lead me to believe that the ToC was part of the spine but other than the hierarchical structure, the ToC is perfectly fine as is.

I think my plan is as follows. [...]

Your plan sounds good to me. For simplified, I'd number the interludes either with Arabic or Roman numerals. You could number them globally instead of per arc, though that is up to you.