Configure external helpers via output formats

florisdf commented 5 years ago

Hi,

The support of pandoc adds many options to possible output formats for Hugo. However, it is now limited to html output. I guess an incredible amount of flexibility would be added when external helpers, like pandoc, can be configured via the output formats configuration, e.g.

[mediaTypes]
  [mediaTypes."text/tex"]
    suffixes = ["tex"]

[outputFormats]
  [outputFormats.TeX]
    mediaType = "text/tex"
    helper = "pandoc"
    helperArgs = ["-t latex"]

This would make for a hassle-free way to generate PDF versions of web pages.

Or is this beyond the scope of Hugo?

bep commented 5 years ago

So your general idea is great. We should definitively connect the "content renderer" (let us worry about the terms (as used in the config struct) later, but the "helper" is not great).

We currently have a one-to-many from output format to "content renderer", using the extension to detect ad => asciidoc etc. We may need a way to express that here as well.
Currently we use always render with a layout (using the output format as one of the lookup keys); in your example I assume you don't want that. So we could add a noLayout flag, meaning just write .Content to disk.

Also, I'm not sure we can/would want to allow passing on arbitrary flags to helpers. We have no way to support everything and it has a lot of unknowns in the security department (pandoc may have a -exec flag for what I know).

@kaushalmodi @regisphilibert and gang may chime in with their view on this topic.

regisphilibert commented 5 years ago

I'll go even though I might be missing something.

Not sure how this would work?

In the given example will Hugo load the pandoc Go Module itself? If so, how shall we install it? npm? If so can't we already have hugo publish the file as is (template file: .Content) and let webpack handle the PDF conversion or else?

bep commented 5 years ago

@regis Hugo already supports Pandoc.

florisdf commented 5 years ago

@bep It might be an idea to use the suffix of the destination file to determine the output format for the renderer? In my example, since the destination has suffix ".tex", hugo can use pandoc as it supports TeX output. Eventually, one could specifically configure the renderer in the config file if the default doesn't suit.

Also, I think the current approach of determining the renderer based on the suffix of the content file conflicts with the idea of separating "form" and "content". Determining the renderer/output format based on the output file makes more sense, in my opinion, supposing that all renderers support a similar markdown dialect.

Finally, I think a layout file is still useful for other output formats apart from html. For LaTeX, it could contain something like

\documentclass[12pt, a4]{article}
\usepackage[utf8]{inputenc}

\title{ {{ .Title }} }
\date{ {{ .Date }} }

\begin{document}
{{ .Content }}
\end{document}

This would then be saved in a file like layouts/_default/single.tex so the hugo lookup rules would be just the way they are now.

setphen commented 4 years ago

Support for pandoc custom writers would be useful https://pandoc.org/MANUAL.html#custom-writers

Though this probably falls into the same realm of security concerns as

We have no way to support everything and it has a lot of unknowns in the security department (pandoc may have a -exec flag for what I know).

RMStoica-zivver commented 4 years ago

Worried about pandoc security? How about don't execute pandoc in the security context of hugo, then?

bep commented 4 years ago

How about don't execute pandoc in the security context of hugo, then?

??

RMStoica-zivver commented 4 years ago

according to the fine manual,

If your application uses pandoc as a Haskell library (rather than shelling out to the executable), it is possible to use it in a mode that fully isolates pandoc from your file system, by running the pandoc operations in the PandocPure monad.

bep commented 4 years ago

If your application uses pandoc as a Haskell library (rather than shelling out to the executable)

We don't use pandoc as a Haskell library. I'm guessing your next question would be "why not?"?

RMStoica-zivver commented 4 years ago

I suspect it's because you want to keep this an as-pure-as-possible Go project? You could also feed the pandoc executable from stdin and take its output from stdout if you don't want it touching files directly. Perhaps even sanitize the output somehow.

bep commented 4 years ago

I suspect it's because you want to keep this an as-pure-as-possible Go project?

That and the roads you propose all involve a lot of work + future maintainance work.

RMStoica-zivver commented 4 years ago

There is yet another way which involves next to zero work - put up a big warning sign HERE BE DRAGONS and just let users run helpers with "unsafe" settings if they so wish.

gohugoio / hugo

Configure external helpers via output formats #6089