dotnet / docfx

Static site generator for .NET API documentation.
https://dotnet.github.io/docfx/
MIT License
4.05k stars 861 forks source link

[Feature Request] Allow use of a custom PDF generator #10275

Open merlinschumacher opened 2 days ago

merlinschumacher commented 2 days ago

I'm not really happy with the way the browser engines render PDFs from the markdown files. Code blocks get split up in pages, sometimes lines are split in the middle, etc. So I've created a custom LaTeX template for pandoc to do these conversions, instead of docfx. Nevertheless, docfx is great and provides a nice solution for .NET project documentation.

I'd love to be able to change the default executor for the PDF conversion from Chromium to a custom command, that takes specified arguments and builds a PDF that docfx then integrates seamlessly.

Currently, I've implemented a hacky solution, where I explictly link the PDF files in the tocs as child items, and call pandoc before running docfx. It works, but it's clunky.

yufeih commented 2 days ago

@merlinschumacher, I'm curious about which custom PDF generator you prefer. Is it a proprietary tool? Docfx used to usewkhtmltopdf, but it hasn't been actively maintained. Is there an alternative you're using now?

merlinschumacher commented 2 days ago

I use pandoc in combination with a custom LaTeX template. That's essentially all. The LaTeX template is just a slightly modified version of pandoc's default template. And there are even popular templates like Eisvogel that are built for exactly the purpose of converting Markdown to PDF and looking good while at it.

Pandoc can also receive metadata, that are used in the resulting files. So I've been able to inject information like the build date of the files into the PDFs, using metadata and corresponding placeholders in the template.

Pandoc is available for all major platforms and the most common required decencies are as well. On Windows even via chocolatey or winget

There are also pandoc filters for plantuml and mermaid. But I didn't get around to check these out, yet.

At the moment I use a python script, that calls pandoc and docfx one after another inside a custom made docker image. The docker image is used in a CI/CD pipeline, where I generate the output. My setup relies on Inkscape for pandoc, which pulls a lot of dependencies, but I believe I can replace it with something smaller like rsvg-convert.

For the conversion from HTML to PDF pandoc relies on Weasyprint, which seems to support CSS as well, and it's said it has better support for print related CSS rules. But that one I didn't test.