dteviot / WebToEpub

A simple Chrome (and Firefox) Extension that converts Web Novels (and other web pages) into an EPUB.
Other
693 stars 134 forks source link

gitbook parser request #438

Open gahoo opened 3 years ago

gahoo commented 3 years ago

Gitbook based online books are gaining popularity, is it possible to add a gitbook parser?

Here's an example site: http://bioconductor.org/books/release/OSCA/

dteviot commented 3 years ago

@gahoo It looks like it should be possible. I am starting to get snowed under with requests, so I'm going to suggest you try doing it yourself. How to:

If you get stuck, feel free to send me an email, or add a note to this issue.

Aside:

However, GitBooks is capable of generating an epub, as well as a web site. So, it would be better to ask sites to provide an epub to download. Or even better, add an option to GitBooks to generate an epub as part of the website content, and include a download link on the site. Hmm... looking at their docs https://docs.gitbook.com/features/pdf-export, they're already going that way.

dteviot commented 3 years ago

@gahoo A couple of notes, if you do try to do this.

  1. Because you want to use the parser based on page format, not site host, you'll need to do something like this to register the parser https://github.com/dteviot/WebToEpub/blob/34afa797fe8ae9b6ba0bbdd8854e7ee0ac9ad668/plugin/js/parsers/MadaraParser.js#L15-L21
  2. WebToEpub doesn't handle case of multiple entreis in the Table of Content pointing to the same web page. So you'll need to cull the URLs to sub-headings. (i.e. The ones with fragment identifier or hash '#' in your implementation of getChapterUrls()
gahoo commented 3 years ago

Thanks for your quick response and patience introduction.

Some Gitbook based online books was generated by bookdown. It might require installing related packages and other extra effort to build which might be time comsuming. So building epub from an online version directly is the fastest way.

The default parser works well except for the chapter with multiple hierarchical subsection which will break the pages into too many separated part leaving large white blank on the page. However, I still don't know how to handle this condition after reading Customizing the Template for a new Web Site.

Here is an example output epub file for your reference and it will be expired in 7 days.

gahoo commented 3 years ago

I figured it out myself. Remove the following codes from stylesheet, then everything works perfectly.

h1, h2 {
   text-align: center;
   page-break-before: always;
   margin-bottom: 10%;
   margin-top: 10%;
}
h3, h4, h5, h6 {
   text-align: center;
   margin-bottom: 15%;
   margin-top: 10%;
}