avdi / quarto

MIT License
469 stars 29 forks source link

Replace Pandoc for EPUB generation #11

Open avdi opened 10 years ago

avdi commented 10 years ago

While Pandoc produces valid EPUB3 files, its architecture makes it fundamentally unworkable for future development of Quarto. Pandoc is built around an internal format which is much less expressive than HTML5 (it is closer to the expressiveness of Markdown). HTML that is fed into Pandoc comes out nearly unrecognizable; in particular, classes and IDs will likely be stripped or changed and many elements will simply be dropped entirely. The Pandoc developer has confirmed that this is part of the fundamental nature of Pandoc, and is not going to change.

Right now the PandocEpub plugin does a massive amount of post-processing on the Pandoc output in order to restore some of the lost information, but it's strictly a stopgap.

The only other potential command-line tool for this task is Calibre, and that has already been rejected because it generates invalid files which cause problems in some readers, and because it has no EPUB3 support (and probably won't anytime soon, according to the developers).

This means we need to start generating EPUB ourselves, either entirely from scratch or using a library. GEPUB looks like a potential fit here.

steveklabnik commented 10 years ago

:cry:

avdi commented 10 years ago

Note that this is an issue for Quarto, but YMMV. Quarto is built around the idea of getting everything into XHTML5 as early as possible, performing some number of transformations and munges, and then converting out of XHTML5 as late as possible. If you're OK with writing stuff in Markdown and converting that straight to EPUB, Pandoc is probably just fine.

And I'm still going to use Pandoc for the initial import from Markdown (and probably other formats) to XHTML; I've had no complaints in that department.