pandoc should use: --from markdown-yaml_metadata_block

dlmiles commented 3 weeks ago

This is in related to the https://discord.com/channels/1009193568256135208/1302455447269281823 thread on well formed markdown format docs not completing the docs GHA task.

The use of \_ appears to be related to the TeX format requirement not a MD format requirement. Pandoc should have converted/escaped the MD document for TeX format if that is what it needs to process it. An internal bug there somewhere.

MD allows use of horizontal rules https://www.markdownguide.org/basic-syntax/#horizontal-rules in the form --- and this appears to be triggering the YAML parser used to parse the top block.

Top of document looks like:

---
documentclass: scrartcl
geometry: "left=2cm,right=2cm,top=2cm,bottom=3cm"
fontsize: 14pt
mainfont: Latin Modern Sans
header-includes:
- \usepackage{hyperref}
- \hypersetup{colorlinks=false,
          allbordercolors={0 0 0},
          pdfborderstyle={/S/U/W 1}}
---

# MULDIV unit (8-bit signed/unsigned)

I tried replacing the 2nd --- with ... to indicate eod-of-document to YML but it did not make a difference.

Using the option --from markdown-yaml_metadata_block does allow it to be processed without error. I guess this defeats a format auto-detection mechanism which maybe causing the issue the problem. pandoc is not exactly sure the following data (after YAMl headers) is MD format.

dlmiles commented 3 weeks ago

A guess at the line that would receive the option:

project.py:893: pdf_cmd = "pandoc --pdf-engine=xelatex --resource-path=docs -i datasheet.md -o datasheet.pdf"

dlmiles commented 3 weeks ago

After looking at the resulting PDF this --from markdown-yaml_metadata_block means turn OFF the YAML metadata block, which ends up rendering that block as inline text, which is not the intention.

I found best results with a sequence like:

## Note use of --strip-comments to help with removal of HTML style comments from MD document `<!---`
pandoc --from gfm --to markdown --strip-comments --resource-path=docs -i docs/info.md  -o datasheet.md

## edit datasheet.md to remove any empty HTML CODE blocks ```{=html}\s+``` which is in the default TT
## template of a markdown file, the use of `--strip-comments` above helped, maybe this can be done in
## python when the next stage is done?  like a multiline regex replace

## prepend the YAML  metadata header info for TeX, maybe it should end with `...` to indicate end of YAML
## document, although I did not find it makes much difference to pandoc.  I think the issue here is the pandoc
## feature of allowing the metadata to be anywhere in the document.

## Then process with the current command:
pandoc --pdf-engine=xelatex --resource-path=docs -i datasheet.md -o datasheet.pdf

The difference is there is a gfm (GitHub Markdown) to markdown (Pandoc Markdown) conversion that occurs, seems to resolve every issue when I try to break it.

TinyTapeout / tt-support-tools

pandoc should use: --from markdown-yaml_metadata_block #78