Investigate PDF as an output format

bep commented 5 years ago

See this tweet:

https://twitter.com/ThatPandaDev/status/1172342347213881344

i miss downloadable docs :( prepping to write code on the plane and i shouldn't need to clone the whole @semanticui and @GoHugoIO repos, then figure out their doc build process... where'd "download as HTML" or PDF ever go to?

I totally agree with the above Tweet. "download as HTML" should be possible as-is if you're building using relativeURLs=true and have your URLs in order, so to speak. That could be something we also could investigate so we could do a static build as part of the release. But a simple PDF with an index would be nice ...

vielmetti commented 5 years ago

It would be a big dependency, but Pandoc is quite capable of starting from Markdown and ending up at PDF, in its case via LaTeX.

ghost commented 5 years ago

@vielmetti unless Go has its own LaTeX library, I think thats a bad idea. At least on my system, LaTeX requires over 100 packages and 73 MB, and thats compressed.

ChrisLasar commented 5 years ago

Maybe a css based approach is useful to achieve this: https://www.smashingmagazine.com/2019/06/create-pdf-web-application/

eine commented 4 years ago

unless Go has its own LaTeX library, I think thats a bad idea. At least on my system, LaTeX requires over 100 packages and 73 MB, and thats compressed.

@cup, AFAIK most (if not all) of this kind of static generators that produce PDF output do so through LaTeX. I'm familiar with Sphinx and pandoc.

Sphinx provides very useful cross-ref features that produce nice results with either html or latex outputs. Unfortunately, although some hooks are provided to slightly customize some parts, it is not possible to provide custom templates for the latex output. However, latex sources are generated in a separate step, so it is possible to manually tweak them. Regarding inclusion of markdown sources in Sphinx projects, although possible, cross-ref features seem not to work so well.

Pandoc, on the other hand, allows to provide custom templates to generate latex output. Unfortunately, features to build the document hierarchy (tocs or minitocs) and for cross-refs from markdown sources are not so rich. Unlike Sphinx, it allows to provide markdown sources along with a Latex template and generate a pdf output directly. Nonetheless, I believe it would be possible to do it in separate steps.

Alternatives for MkDocs seem to also involve using pandoc and/or LaTex (see mkdocs/mkdocs#374).

Overall, although it is possible to overload the print feature of CSS (#6627), IMHO proper PDF support in hugo should involve generating LaTeX output (either with a fixed template, as Sphinx, or supporting custom templates, as pandoc).

This summary is not casual. I have tried to migrate several projects from Sphinx (readthedocs) to hugo, with little success. Not being able to provide a PDF version is a real stopper.

rob-at-airwalk commented 4 years ago

I use this... https://weasyprint.org/start/ with a custom Hugo theme to add pages in the right places etc. the output is pretty much spot on, but I've not got it in a pipeline to generate the pages as yet.

julientaq commented 4 years ago

Hi there, i'm working on Paged.js, a html to pdf solution that follows the w3c standards, to print books easily. Incidently, i’m making our new website using Hugo, and we’ll definitely use Paged.js to render pages of the documentation. And since it’s CSS only, we don’t really need to spend hours learning new tools. We’re also working on a CLI version, full node/javascript, so we could build the new book anytime there is a release.

If you want to test things out, we do have a https://gitlab.pagedmedia.org/tools/pagedjs and we’ll be happy to help (we definitely need to have some sort of paged.js plugin for our website, so we’ll be happy to collaborate on that one.

julientaq commented 4 years ago

Hey folks! just wanted to tell you that our pagedjs.org website is up, all made with hugo and it’s amazing to see it work so fast and so well. You’re doin’ an awesome job.

If you check the article and docs, you have a button top right that launch paged.js so you can make a book out of the page (it’s an experimental feature right now, but we’re getting there). Once the book is render, you can print it as you would print any page. Right now, Chromium is well supported, and we’re workin on Firefox specificities -- there is a long story why, but it’s not because we only want to support Chrome, far from that).

I will try to make a theme with just that option (same as the matomo plug-in).

Then you’ll be able to make a book out of any of your web page.

Then, i’ll see how we can have Pagedjs-cli running. It’s a node package, and i’ve no idea what would be the best way to interface this with hugo, nor what should we set up to define with page should be rendered. Anyway, i’ll keep coming to you for that if i have question.

julientaq commented 4 years ago

Hey folks, I made a small implementation of paged.js for www.pagedjs.org so you can have a look at it and tell me how it works for you.

Still pretty experimental, but i’d love to have your thoughts.

https://gitlab.pagedmedia.org/julientaq/pagedjs-hugo

If you follow the readme, you’ll get a print button that you can add to any page as part of the template. next step is to add a configurable option in the front matter to show it or not :)

Jos512 commented 4 years ago

I made a small implementation of paged.js for www.pagedjs.org so you can have a look at it and tell me how it works for you.

Sorry to say, but for me this doesn't work. I want Hugo to generate a ready PDF that I can then share with my followers or offer as a bonus for email signups.

I don't want to tell website visitors to go to print, choose the right output format, verify the advanced settings, and then save it to PDF themselves, as the docs specify.

I'm afraid that's just too cumbersome for non-technical people. (Plus Firefox doesn't have a 'print to pdf' feature it seems, so it only targets Chrome?)

(That being said, there are of course countless other people for which this approach works fine! :slightly_smiling_face: )

julientaq commented 4 years ago

(That being said, there are of course countless other people for which this approach works fine! 🙂 )

🙇

I'm afraid that's just too cumbersome for non-technical people. (Plus Firefox doesn't have a 'print to pdf' feature it seems, so it only targets Chrome?)

It is not a solution for everyone indeed. Firefox has a Save as pdf that is triggered when you hit print, and same for Chrome or Safari (i don’t remember exactly about Edge, but Edgium being blink powered, it will definitely offers the option). I wonder if i could bypass the print dialog to just ask for the download location as a way to ease our user life.

I want Hugo to generate a ready PDF that I can then share with my followers or offer as a bonus for email signups.

Since Paged.js can also runs as a command line interface, i need to see what would be the best way to run it when Hugo finishes its build. For now, if you install pagedjs-cli in a Node environment, you can run the script and it will transforms some HTML into a PDF ready for download.
My first idea is to see how and if we could set it up the same way postcss is used inside Hugo, but i havent got into it that knowledge yet.

Jos512 commented 4 years ago

if you install pagedjs-cli in a Node environment, you can run the script and it will transforms some HTML into a PDF ready for download.

This is really interesting. Thanks for making this comment.

Good luck with the pagedjs project! :slightly_smiling_face:

jwaschkau commented 4 years ago

I get pretty good looking results with this mkdocs plugin https://github.com/comwes/mkpdfs-mkdocs-plugin

septatrix commented 4 years ago

I think this lies beyond hugos goals as I see it as a HTML focused solution with its strengths exactly there. Instead I think such problems should be solved by providing proper CSS styles for the print media type (i.e. hiding menus etc.). Afterwards if one wants to print the page they can do it from their browser or - if they want to automate it - can use something like Puppeteer or Selenium to print them. Including a pdf library or generator like LaTeX would be annoying to maintain and finding a fitting solution for everybody very hard. Furthermore the project linked above also seems like a great alternative

Jos512 commented 4 years ago

I think this lies beyond hugos goals as I see it as a HTML focused solution with its strengths exactly there.

That's not Hugo's own perception. From the custom output docs:

Hugo can output content in multiple formats, including calendar events, e-book formats, Google AMP, and JSON search indexes, or any custom text format.

Hugo's primary output is indeed HTML, but Hugo already moved away from solely focusing on HTML a time ago.

septatrix commented 4 years ago

That is a fair point though the section you cited also states or any custom text format. Binary formats like PDF or similar lie outside of this scope as they are not template based but instead need an extra processing step which is relatively performance heavy in comparison

paperdigits commented 4 years ago

We're using hugo to make s website for documentation of a application, and PDF of the entire site is an absolute must-have. For now, I'm trying to do this as a theme, where the home page index template finds all the pages, then renders them into on single page. I then use weasyprint to make a PDF. You can see the template, which is very much in progress, here: https://gitlab.com/pixlsus/hugo-pdf-theme

I'd love any pointers on how to better do this, as of now, images and links don't work, because they use a relative path.

maelle commented 4 years ago

@paperdigits my experimentations in https://github.com/maelle/testbook might be close to what you're doing (the Netlify config does two website builds, one with the "website" configuration, one with a configuration that puts everything on a page.) I haven't tried images but it uses pagedjs-cli which might deal better with that? cc @julientaq

paperdigits commented 4 years ago

@maelle thanks for sharing, but this is less about the specific PDF formatter (there are several PDF formatters, weasyprint, wkhtmltopdf, pagedjs) and more about the hugo templates. Your code works well if you have a flat page hierarchy, like a blog, but doesn't work if you have hugo bundles and pages that have child pages.

If you take a look at my code, I get the first pages in the hierarchy, ranked by weight, then loop through and check if that page has children, then if the child page has children, and recurse all the way down, keeping the structure. My index file calls the template content_gen.html which then calls content_recursive.html to find all the children.

That is what I'm wondering if I can do better.

lucasew commented 3 years ago

What about generating an EPUB?

EPUB is basically a bunch of compressed HTML with a specification.

I already use this lib on a project that I made that gives me value often.

The hardest part is to bundle the assets inside the book and fix the links. Each page can be a chapter and you will need to simplify the stuff to put in the generated book. The required data is basically all available for the book metadata and it's mobile-friendly to read. If you want you can try to convert to PDF using calibre.

jmooring commented 3 years ago

@lucasew See https://discourse.gohugo.io/t/generate-hugo-website-as-e-book-epub/29559.

divinerites commented 2 years ago

Still pretty experimental, but i’d love to have your thoughts. gitlab.pagedmedia.org/julientaq/pagedjs-hugo

hello @julientaq it seems your page is 404. Do you have a working link ?? Thanks.

[EDIT] I think I found the page here : https://gitlab.coko.foundation/pagedjs/hugo-pagedjs-plugin

julientaq commented 2 years ago

Hi @divinerites sorry, i just got back from a couple of days out.

Glad you found the repo. Tell me if you need any help setting it up, must be a bit old now.

divinerites commented 2 years ago

Hi @divinerites sorry, i just got back from a couple of days out.

Glad you found the repo. Tell me if you need any help setting it up, must be a bit old now.

yes i opened an issue on your repo. let's keep talk there.

bc-m commented 1 year ago

Maybe md-to-pdf can help here, which can convert markdown and html to pdf. You can also pipe via stdin and stdout.

UtkarshVerma commented 1 month ago

I needed this for my portfolio and I got it to work in a hacky way. I basically rely on the PagedJS CLI and then have it run as a resources.PostProcess task builds the PDF and replaces the HTML page with the PDF.

Would love to see if some neater alternatives exist for this.

gohugoio / hugo

Investigate PDF as an output format #6332