Witiko / markdown

:notebook_with_decorative_cover: A package for converting and rendering markdown documents in TeX
http://ctan.org/pkg/markdown
LaTeX Project Public License v1.3c
329 stars 31 forks source link

Integrate with jgm/pandoc #25

Closed Witiko closed 2 years ago

Witiko commented 6 years ago

Currently, a user of the Markdown package is restricted in their choice of syntax extensions to the ones provided by the Lua parser implemented in markdown.lua. To provide experimental ground for implementing new syntax extensions, support for the internal abstract syntax tree (AST) format of the jgm/pandoc converter will be added.

Currently, jgm/pandoc can be used to provide conversion from various input formats to Markdown:

pandoc -f docx -t markdown input.docx -o input.md

The Markdown package can then be used to convert the Markdown document to the TeX abstract syntax tree format (TeX AST) produced by the Markdown package:

texlua /path/to/markdown-cli.lua input.md input.tex

This representation can then be typeset. This is useful, but limited to the Markdown syntax extensions supported by our Lua parser.


The plan is to provide a jgm/pandoc Lua writer (see jgm/pandoc issues 4341 and 1541 for futher information) that will directly convert the jgm/pandoc AST to the TeX AST, circumventing the Lua parser altogether:

pandoc -f docx -t /path/to/markdown-pandoc_writer.lua input.docx -o input.tex

Adding initial support for a new syntax extension already supported by jgm/pandoc will then be as easy as adding a new procedure to the writer and defining the corresponding \markdownRenderer… macros. Full support can be added later by extending our Lua parser.


TODOs:

Witiko commented 6 years ago

Adding a jgm/pandoc reader for reconstructing a jgm/pandoc AST from a TeX AST would be also benefitial. This reader would be more of a plumbing tool for restoring a document from the intermediary TeX AST files for cases where the original sources are unavailable. However, there does not seem to be any Lua API for readers, so we either need to abuse the AST and create a Lua filter, or we would need to create a Haskell reader and contribute it to jgm/pandoc, as discussed in https://github.com/jgm/pandoc/issues/1541#issuecomment-894786097. Similarly to a Haskell reader, we could also contribute a Haskell writer that would replace the Lua writer in the long run and would be maintained as a part of jgm/pandoc.

Witiko commented 3 years ago

An exhaustive specification of the elements of Pandoc's AST format is available on Hackage.
The full list of Lua functions reserved for Lua writers is available in jgm/pandoc's src/Text/Pandoc/Writers/Custom.hs.

Witiko commented 3 years ago

[...] create a Haskell reader and contribute it to jgm/pandoc, as discussed in jgm/pandoc#1541 (comment). Similarly to a Haskell reader, we could also contribute a Haskell writer that would replace the Lua writer in the long run and would be maintained as a part of jgm/pandoc.

@drehak I have created a development environment for Pandoc using Docker at witiko/pandoc-devenv. We can use it to develop Haskell readers and writers for Pandoc without littering our base OS with Haskell. Those who want to litter their base OS can take inspiration in our Dockerfile.

Witiko commented 2 years ago

A preliminary analysis by for the implementation has been authored by @drehak and published in the CSTUG Bulletin 2021/1-4 (landing page, PDF).

Witiko commented 2 years ago

@drehak We should aim to close this issue before milestone 2.15.0 (due on March 31), since the defense of your student project will likely take place before then and also because we'd like to publicize your proof of concept in a journal article for TUGboat 43:1 (also due on March 31, see #120).

Witiko commented 2 years ago

Tentative roadmap for @xrehak's bachelor's thesis

Future work