Witiko / markdown

:notebook_with_decorative_cover: A package for converting and rendering markdown documents in TeX
http://ctan.org/pkg/markdown
LaTeX Project Public License v1.3c
331 stars 31 forks source link

Support OpTeX #215

Open Witiko opened 1 year ago

Witiko commented 1 year ago

The Markdown package currently supports plain TeX, LaTeX, and ConTeXt. We should add support for further formats such as Petr @Olsak's OpTeX.

Witiko commented 1 year ago

@olsak I can handle tasks 2–4, but I would appreciate help with task 1 (Define renderers for OpTeX).

In the user manual, there is a list of all renderers in the Markdown package. We have some definitions for plain TeX (see markdown.tex) that will be inherited by OpTeX but anything more complicated (citations, tables, headings, …) needs to be defined for each format separately (see definitions for LaTeX in markdown.sty and ConTeXt in t-markdown.tex).

If I can get a list of definitions such as the following, I can take it from there.

\def\markdownRendererHeadingOnePrototype#1{\tit #1^^M}%
\def\markdownRendererHeadingTwoPrototype#1{\chap #1^^M}%
\def\markdownRendererHeadingThreePrototype#1{\sec #1^^M}%
% ...

With correct definitions, we should be able to typeset the document example.md in OpTeX as follows:

\input markdown

% Options
\def\markdownOptionContentBlocks{true}%
\def\markdownOptionDebugExtensions{true}%
\def\markdownOptionDefinitionLists{true}%
\def\markdownOptionFancyLists{true}%
\def\markdownOptionHashEnumerators{true}%
\def\markdownOptionInlineNotes{true}%
\def\markdownOptionNotes{true}%
\def\markdownOptionPipeTables{true}%
\def\markdownOptionRawAttributes{true}%
\def\markdownOptionSmartEllipses{true}%
\def\markdownOptionStrikeThrough{true}%
\def\markdownOptionSubscripts{true}%
\def\markdownOptionSuperscripts{true}%
\def\markdownOptionTableCaptions{true}%
\def\markdownOptionTaskLists{true}%

% Renderer prototypes
\def\markdownRendererHeadingOnePrototype#1{\tit #1^^M}%
\def\markdownRendererHeadingTwoPrototype#1{\chap #1^^M}%
\def\markdownRendererHeadingThreePrototype#1{\sec #1^^M}%
% ...

% Example document
\fontfam[LMfonts]
\markdownInput{example.md}
\bye
olsak commented 1 year ago

I mean that \def\markdownRendererHeadingTwoPrototype#1{\chap #1^^M} will not work because ^^M is TeX-unfriendly character. OpTeX scans the line after \chap, \sec etc to its end in verbatim mode, i.e things like "verbatim in title" are read as they are and the parameter is tokenized later when it is needed. So, user can write something like

\sec We talk about the `{` character

and it will work in titles, outlines, tables of contents etc. (Something similar is impossible in LaTeX.). But we can throw away this feature of scanning to the end of line and use internal OpTeX macros for markdown package, i.e.

\def\markdownRendererHeadingOnePrototype#1{\_printtit{#1}}
\def\markdownRendererHeadingTwoPrototype#1{\_inchap{#1}}
\def\markdownRendererHeadingThreePrototype#1{\_insec{#1}}
\def\markdownRendererHeadingFourPrototype#1{\_insecc{#1}}

I'll take look to another prototypes...

olsak commented 1 year ago

I tried to \input markdown in OpTeX but there are problems. It does \input expl3-generic and this macro file cannot be loaded in OpTeX without additional tricks. I hope that we need not to load this macro file in OpTeX, but I can mention the tricks.

  1. expl3-generic loads a lua code which expects luatexbase.new_luafunction is defined. But OpTeX has its own luatexbase and tis function isn't defined in it. So, we need
\directlua{
   function luatexbase.new_luafunction(name)
      return \string#lua.get_functions_table() + 1
   end }
  1. \foo_ macros are inaccessible in OpTeX by default, you need run \mathsboff before \input expl3-generic
  2. expl3-generic expects that \newread and \newwrite are defined as in plain TeX but it is not true. We need
    \def\newread  {\_newread}
    \def\newwrite {\_newwrite}
  3. expl3-generic needs the control sequence \e@alloc@ccodetable@count.
    \slet{e@alloc@ccodetable@count}{_catcodetablealloc}

Sumarry: the following code must be inserted before \input markdown:

\directlua{
   function luatexbase.new_luafunction(name)
      return \string#lua.get_functions_table() + 1
   end }
\mathsboff
\def\newread  {\_newread}
\def\newwrite {\_newwrite}
\slet{e@alloc@ccodetable@count}{_catcodetablealloc}

\input markdown

\markdownBegin
Hello *world*!
\markdownEnd

\bye
Witiko commented 1 year ago

I hope that we need not to load this macro file in OpTeX, but I can mention the tricks.

The Markdown package exposes a Lua module, so it is definitely possible to make the OpTeX layer sit directly on top of the Lua layer unlike the ConTeXt and LaTeX layers, which sit on top of the Plain TeX + expl3 layer:

block-diagram-optex

This would honor the spirit of the OpTeX format (minimalistic, fast, no-nonsense). It would reduce code reuse, but that may be a good thing, because it allows us to reimagine e.g. the option-passing from TeX to Lua. Here is how you could use the Markdown package via Lua in OpTeX:

\directlua{
  local ran_ok, kpse = pcall(require, "kpse")
  if ran_ok then kpse.set_program_name("luatex") end
  local convert = require("markdown").new()

  function markdown(input)
    local output = convert(input)
    tex.print(output)
  end
}

\let\markdownRendererDocumentBegin\relax
\def\markdownRendererEmphasis#1{{\em #1}}
\let\markdownRendererDocumentEnd\relax

\directlua{ markdown"Hello *world*!" }

\bye
Witiko commented 1 year ago

I hope that we need not to load this macro file in OpTeX, but I can mention the tricks.

This is useful to know and we may want to forward this to https://github.com/latex3/latex3, so that expl3-generic can be loaded in OpTeX without hassle.

olsak commented 1 year ago

It is amazing! If there is a list of all control sequences generated by the convert Lua function (like \markdownRendererEmphasis in your example) and they are documented ,then I can prepare macros for each such control sequence. Moreover, if there is a need to set parameters in key=value format then it can be done at Lua level or at macro level using simple OpTeX macros \kv and \kvscan, see section 2.9 in OpTeX documentation.

Witiko commented 1 year ago

All \markdownRenderer... control sequences (also known as renderers) are listed and defined in the user manual. Furthermore, we can also use the internal reflection API exposed by the plain TeX layer of the Markdown package to list all renderers and the number of parameters that they accept:

% tricks needed to load expl3-generic.tex package
\directlua{
   function luatexbase.new_luafunction(name)
      return \string#lua.get_functions_table() + 1
   end }
\mathsboff
\def\newread  {\_newread}
\def\newwrite {\_newwrite}
\slet{e@alloc@ccodetable@count}{_catcodetablealloc}

\input markdown

% list all renderers and the number of parameters that they accept
\begitems
\ExplSyntaxOn
\seq_map_inline:Nn
  \g__markdown_renderers_seq
  {
    \tl_set:Nn
      \l_tmpa_tl
      { #1 }
    \regex_replace_once:nnN
      { ^. }
      { \c { bslash } markdownRenderer \c { str_uppercase:n } \cB\{ \0 \cE\} }
      \l_tmpa_tl
    \prop_get:NnN
      \g__markdown_renderer_arities_prop
      { #1 }
      \l_tmpb_tl
    * \l_tmpa_tl{}~(accepts~\l_tmpb_tl{}~parameters)
  }
\ExplSyntaxOff
\enditems

\bye

Here is the output of running OpTeX on the above source code with the current source code from branch main:

renderers-4bdde29

Most renderers have a fixed number of parameters and have an obvious default definition, such as \markdownHeadingOne, which has one parameter and corresponds to a top-level heading. However, there are some exceptions:

olsak commented 1 year ago

I tried first attempt to set OpTeX macros with markdown package. But I don't understand many things, the % ?? is here. For example, why there is UlBeginTight, UlEndTight. I see only single behavior of lists in Markdown documentation. Moreover, the \markdownRendererLink cannot give the raw URI in #3 if it includes %, for example.

\fontfam[lm]
\hyperlinks\Blue\Blue

\_directlua{
  local ran_ok, kpse = pcall(require, "kpse")
  if ran_ok then kpse.set_program_name("luatex") end
}

\_eoldef \markdownBegin #1{% #1 includes the end of the current line, parameters can be here
   \_def\_markdownParams{#1}%
   \_bgroup \_setverb \_savemathsb \_endlinechar=`\^^J
   \_markdownBeginA
}
\_ea\_def \_ea\_markdownBeginA \_ea#\_ea1\_csstring\\markdownEnd#2^^J{%
   \_restoremathsb \_egroup 
   \_directlua{
      local convert = require("markdown").new({\_markdownParams})
      tex.print(convert("\_luaescapestring{#1}"))}%
}

\_edef \markdownRendererAmpersand   #1{\_csstring\&}
\_edef \markdownRendererBackslash   #1{\_csstring\\}
\_edef \markdownRendererCircumflex  #1{\_csstring\^}
\_edef \markdownRendererDollarSign  #1{\_csstring\$}
\_edef \markdownRendererHash        #1{\_csstring\#}
\_edef \markdownRendererLeftBrace   #1{\_csstring\{}
\_edef \markdownRendererPercentSign #1{\_csstring\%}
\_edef \markdownRendererPipe        #1{|}            % ??
\_edef \markdownRendererRightBrace  #1{\_csstring\}}
\_edef \markdownRendererTilde       #1{\_csstring\~}
\_edef \markdownRendererUnderscore  #1{_}            % ??

\_def\markdownRendererLink  #1#2#3#4{\_ea\_ulink\_ea[\_expanded{#3}]{#1}} % ?? raw URI? doesn't work with hybrid=true

\_def \markdownRendererAttributeIdentifier #1{} % ??
\_def \markdownRendererAttributeClassName  #1{} % ??
\_def \markdownRendererAttributeKeyValue   #1#2{} % ??

\_def \markdownOptionTightLists     {true}
\_def \markdownRendererUlBegin      {\_begitems}
\_def \markdownRendererUlBeginTight {\_begitems \_novspaces} % ??
\_def \markdownRendererUlEnd        {\_enditems}
\_def \markdownRendererUlEndTight   {\_enditems}
\_def \markdownRendererUlItem       {\_startitem} 
\_def \markdownRendererUlItemEnd    {\_par}

\_def \markdownRendererInterblockSeparator {\_par} % ??

\_def \markdownRendererInputVerbatim    #1{\_verbinput (-) {#1} }
\_def \markdownRendererInputFencedCode  #1#2{\_verbinput (-) {#1} } % ??

\_def \markdownRendererCodeSpan         #1{#1} % ??

\_def \markdownOptionContentBlocks             {true} % ??
\_def \markdownRendererContentBlock            #1#2#3#4{This is {\_tt #2}, #4.} % ??
\_def \markdownRendererContentBlockOnlineImage #1#2#3#4{This is the image {\tt #2}, #4.} % ??
\_def \markdownRendererContentBlockCode        #1#2#3#4#5{% ??
      This is the #2 (\_uppercase{#1}) document {\_tt #3}, #5.%
}

\_let \markdownRendererDocumentBegin   \_relax
\_let \markdownRendererDocumentEnd     \_relax
\_def \markdownRendererBlockQuoteBegin {\_begblock}
\_def \markdownRendererBlockQuoteEnd   {\_endblock}

\_def \markdownRendererEmphasis        #1{{\_em #1}}

Tests:

\markdownBegin hybrid=true
This is a list *without* vertical spaces above and below:

- the first item
  at more lines
- the second item: $\sum_k^n x\_k=b$
- the third item

Next paragraph.

> This is a block of text
> in more lines.

Final paragraph.
\markdownEnd

\bye
Witiko commented 1 year ago

@olsak For example, why there is UlBeginTight , UlEndTight. I see only single behavior of lists in Markdown documentation.

See the documentation of option tightLists:

Unordered and ordered lists whose items do not consist of multiple paragraphs will be considered tight. Tight lists will produce tight renderers that may produce different output than lists that are not tight:

- This is
- a tight
- unordered list.

- This is

  not a tight

- unordered list.

See also the documentation of bullet item renderers:

The \markdownRendererUlBegin macro represents the beginning of a bulleted list that contains an item with several paragraphs of text (the list is not tight). The macro receives no arguments.

The \markdownRendererUlBeginTight macro represents the beginning of a bulleted list that contains no item with several paragraphs of text (the list is tight). This macro will only be produced, when the tightLists option is disabled. The macro receives no arguments.

See also the CommonMark spec.

Since non-tight lists contain bullet items with multiple paragraphs, it may be a good idea to add vertical spaces not just around the list but also between the individual items. Here is how the above example is rendered in LaTeX by default:

scrot

Witiko commented 1 year ago

@olsak Moreover, the \markdownRendererLink cannot give the raw URI in #3 if it includes %, for example.

The plain TeX layer changes the catcodes of % and # to other (12) inside \markdownBegin ... \markdownEnd. The main reason is that allowing %-comments with option hybrid=true produces unintuitive results, since the the markdown parser does not preserve newlines during conversion, see e.g. Section 2.2 in our TUGboat 42:2 article. Furthermore, both % and # are commonly featured in URLs and relative references and having them both as category other makes renderer definitions easier.

Witiko commented 1 year ago

@olsak I am planning to tackle OpTeX support in the version 2.23.0 of the Markdown package (to be released at the end of April) and discuss it briefly at TUG 2023.

Witiko commented 1 year ago

@olsak In #292, I have just added a minimal demo of using OpTeX with the Markdown package to file examples/optex.tex. Here is the resulting PDF document: optex.pdf

This demo will be included in Markdown 3.0.0, to be released later this month and to be presented at TUG 2023. This is a minimal viable product that mainly includes base markdown elements and not syntax extensions such as tables, tickboxes, or notes.

Support for more syntax extensions can be added as follows:

  1. Enable said extension in the \markdownOptions macro, for example pipeTables=true, and tableCaptions=true, for tables.
  2. Add an example of the markdown element from examples/example.md between \markdownBegin ... \markdownEnd. For example:

    This is a table:
    
    | Right | Left | Default | Center |
    |------:|:-----|---------|:------:|
    |    12 | 12   | 12      |   12   |
    |   123 | 123  | 123     |   123  |
    |     1 | 1    | 1       |    1   |
    
      : Demonstration of pipe table syntax.
  3. Define the corresponding renderer macros, for example the table renderers.

I have scheduled these additions to the August 2023 release. Contributions are appreciated.