alexadam / save-as-ebook

Save a web page/selection as an eBook (.epub format) - a Chrome/Firefox/Opera Web Extension
MIT License
1.1k stars 70 forks source link

Feature Request: extract H1,H2,etc. tags as chapters #49

Open paul-chambers opened 2 years ago

paul-chambers commented 2 years ago

I'd like to be able to convert a page or selection as multiple chapters based on the location of H1, H2, etc tags, using the plain text between the opening and closing of a tag as the chapter name.

As an example, I just converted the Lua 5.3 Reference Manual to an eBook. It was a tedious process to manually select each chapter and capture it individually, then go back to edit the chapters to correct their titles. It would have been so much easier to just capture the page and have this extension be able to generate the chapters from the H1 and H2 tags on the captured page.

alexadam commented 2 years ago

a good idea but it will only work if the page is properly formatted with h1. h2... I can see a lot of issues from people trying to apply this to a 'modern' page that doesn't use s but looks like one who does. I suppose you are dev and you can take a look at the source. Anyway, it can be disabled if the page does not contain enough s

paul-chambers commented 2 years ago

Yes, I am an experienced dev, but on embedded Linux. I'm not fluent in JS, sadly.

I agree, if a page was authored purely with CSS style sheets and DIV tags, there's no simple way to recover the structure of the document. Having said that, many of the docs I want to convert, the formatting isn't that fancy, (or just CSS styles applied to the heading tags), so it would be useful to be able to recover it in one step. Luckily, it should be easy to spot the difference between a page that's using H1, H2 repeatedly vs. one that only uses them once or twice.