LaTeX beamer import (tips & tricks)

This is essentially a "wontfix" issue, but as this is a frequent requested thing, I decided to document some of my findings here, so people can find them more easily. Feel free to contribute your experience and tricks!

Many STEM educators will have latex-beamer slides to convert. There is a number of options to convert LaTeX to HTML, but they tend to fail badly for latex-beamer slides, unfortunately. HTML could then probably be further converted with pandoc. Long story short, I ended up using plasTeX to convert several hundred beamer slides to Markdown, so this is possible (but requires scripting, hacking your tex code, etc.) See below for what could be converted, and what not.

Some options, sorry that I do not remember the exact drawbacks of each approach; feel free to contribute.

pandoc: since decker is built upon pandoc, this seems to be the most natural choice, but latex reading support of pandoc is minimal, unfortunately.
tex4ht/htlatex: based on the latex interpreter, more based on avoiding HTML by writing TeX, not to convert semantics
pdf2htmlEX: going via PDF loses semantics and math equations
pdf2svg: give up on retaining editing, make every beamer slide an image...
latex2html: no understanding of beamer?
tth: no understanding of beamer?
LaTeXML: fragile, no beamer support?
plastex: Python parser for tex that can even expand some macros and has stubs for several packages.

plasTeX by default produces HTML, and uses MathJax for equations. But since output is based on templates, we can also output using Markdown templates, so this can even yield a beamer-to-markdown conversion. When running this on my (large, complex) set of slides, I did run in a number of issues and bugs though. I opened several pull requests in the plastex github repository, some of them were merged, some not. Nevertheless, this was the most capable solution that I found.

:star: easy-to-customize jinja2 templates for known commands
:star: simple \newcommand definitions will be expanded
:x: complex packages probably won't work; but you can "stub" them in Python to help the parser
:star: beamer frames will become sections as needed
:star: basic formatting is largely retained (it will even generate inline styling, e.g., [small]{ .small })
:grey_exclamation: colors usually get output as hex codes, so you may want to replace these codes with some named palette color afterwards again
:star: math is retained, simple commands are expanded so you get good mathjax compatibility
:grey_exclamation: it may be a bit too eager to expand math macros for your liking, and the resulting mathjax will have syntactically equivalent but ugly extra whitespace, e.g., \sum _i instead of \sum_i.
:x: \intertext is currently not supported by mathjax
:star: tables can be converted into inline HTML
:x: tabbings do not exist in HTML, we output tables instead
:grey_exclamation: nesting with tables can fail, as Markdown-in-HTML-in-Markdown is not well defined
:grey_exclamation: while \vspacecan be done in HTML, spacing and sizing tends to be not close enough for this to be too useful
:star: images (but also, e.g., pseudocode) can be rendered into images to include
:x: image overlays are not preserved, you get a flat version
:star: \pause becomes . . . (but that does not work inside blocks)
:x: beamer overlays do not convert into reveal.js well because they have rather different concepts, you need to redo these (and Markdown syntax for this is a pain)
:grey_exclamation: image sizing needs manual intervention
:x: you will not be able to preserve \if conditionals. E.g., I used \ifStudent to have student and teacher versions of some slides, but plastex will evaluate the if statement, and either follow one branch or the other (maybe you could define ifStudent to not actually be an if statement to convert this functionality, but it does not exist in decker)
:x: \only/\visible/\note are commands, and may not contain environments - latex seems to be rather tolerant to abuse of this, plastex is not. Use \begin{onlyenv}, \begin{visibleenv} instead, that is the correct latex. Unfortunately, there is no noteenv, and while \note{\begin{verbatim} abc \end{verbatim}} may work in latex, it is invalid and breaks plastex.
:x: once tex parsing fails, plastex can fall on its nose completely

Here is my markdown output branch: https://github.com/kno10/plastex/tree/markdown

Here are some hints I shared with colleagues before:

My branch with the markdown code (do not use main, check out the markdown branch).

https://github.com/kno10/plastex/tree/markdown

Instructions:

setup and activate a Python virtual environment
python3 setup.py install

start conversion from your tex folder:

plastex --renderer Markdown -d output-folder \
--filename 'index [$id.md, sect$num(4)]' \
--image-filenames 'index [$id/$id-$num(4), images/img-$num(4)]' \
--xml --save-image-file \
--image-scale-factor 1.0 --image-resolution 300 \
plastex.tex

I suggest to first get it to run without images (to have some initial success, the images add another layer of complexity). If there is some parsing issues I found it useful to inspect the (invalid) XML to pin point which command breaks the parser. The image-scale-factor and image-resolution might need some increasing to get "optimal" automatic sizing in HTML (my exports are all a little small right now, and I have to manually add sizing to all images in the Markdown, as I first generated them for web). I think I ended up using inkscape (automatically via command line) and/or pdftocairo for pdf2svg conversion, IIRC because cropping and scaling worked best here.

The templates used are in the folder plasTeX/Renderers/Markdown/ (I rerun install every time when I change them, never bothered to look up how to use a local template, as I also had to change a lot of the .py files). The algorithm2e.jinja2s is an example that converts the (difficult) environment to an image.

Some changes I had to do to my source (I copied my main tex to plastex.tex, otherwise tried to keep changes minimal):

set a blank beamer theme for image export
disable some imports that were too difficult
add a tex if for alternatives:

\newif\ifplastex\plastexfalse % always true in plastex, for switching

Then you can do \ifplastex simpler version \else regular version \fi, this also works for "commenting" some problematic slides from export.

commands wrapped in \plastexpassthrough will end up in the .tex for generating the images, but not affect the markdown - I had to use this for example for my color definitions.

in my plastex.tex, I override some commands I use that did not work well, e.g.,

\def\newrefsection{} % biblatex only, not supported by plastex currently
\providecommand{\diagbox}[1]{#1} % not compatible with plastex
\renewcommand<>{\highlight}[2][structure]{{\textcolor#3{#1}{#2}}}
\let\smallunderbrace\underbrace
\renewcommand{\conclusion}{\textcolor{structure}{➜}\space}
\renewcommand{\ExternalLink}{🔗}
\renewcommand{\tmark}[1]{}% remember position, will not work for plastex
\newcommand{\TD}{\ensuremath{\operatorname{TD}}}
\def\fakeitem{\textcolor{structure}{➜}\space}
\def\given{\vert}

You can see I used utf-8 characters lightly in the plastex.tex, as they work in plastex fine, instead of having to get the special characters differently for regular pdflatex.

I had to move some local defined, colors, newcommands, graphicspath into the plastex.tex file so they end up being available in the image generation step. Fortunately, I did not have conflicting definitions/renewcommands as these might be harder to get working. The image generation flow is roughly to collect the tex commands up to the begin document as a preamble, then one page per image to be converted.
I added a \label{} to every section, in order to control the file names, or the files will all be sect0123.md etc.

For image export, I made the slides much larger to have excess whitespace to help SVG cropping via plastex.tex, while trying to preserve the inner dimensions to not break \textwidth and similar sizing too much. This fragment is a mess, but eventually I was okay with the results.

% Increase page size, to improve image extraction
\plastexpassthrough
\geometry{papersize={27.78cm,20cm}, left=0cm, right=10cm, bottom=10cm, top=0cm}
\makeatletter
%% Margin in meinem Layout 0.65*7mm = 4.55mm
\setlength\beamer@paperwidth {505.824pt} % 17.78 cm, 16:9
\setlength\beamer@paperheight {10cm}
\hsize=17.78cm\relax\vsize=10cm\relax
\hoffset=5cm\relax\voffset=5cm\relax
\def\pgfsys@thepageheight{10cm}%\the\vsize}
\def\pgfsys@thepagewidth{17.78cm}%\the\hsize}
\makeatother
\setbeamersize{text margin left=0.455cm, text margin right=10.455cm}
\setbeamertemplate{headline}{\rule{0pt}{16.55mm}} % 7mm logo + 2*4.55mm? + 5cm
\setbeamertemplate{navigation symbols}{\rule{0pt}{10mm}} % line height + 4.55mm? + 5cm
\setbeamertemplate{background canvas}{}% transparent PDFs, for cropping
\endplastexpassthrough

You will likely need some more fiddling, unfortunately. But it is usually some corner cases. Sometimes it may be easiest to just skip the slide and convert it manually to save the effort.

Some additional hints:

aim for 90-10. Convert 90%, redo the remaining 10%, but better.
I doubt you will be able to automate this completely and use tex as your primary source. Instead I suggest you only consider migration to decker/reveal.js, not for parallel use. Embrace the benefits of decker: add interactive slides, enjoy the accessibility features (and compare this to making accessible PDFs with latex...)
because of this, I doubt this will ever be a "packaged import tool"
rather than converting all slides at once, it may be a good idea to work with smaller sections at a time to see some progress. Skip sections with difficult macros initially.
when parsing macros/packages fail, try replacing them with no-ops
you can write Python classes to replace entire latex packages. E.g., the empty Packages/lmodern.py file instructs plasTeX to simply ignore the lmodern package completely, the Packages/algorithm2e.py primarily defines an algorithm environment that is simply a verbatim (i.e., not parsed by plastex).
plastex.xml is not valid XML, but can be useful to find where the parser breaks
my most common issue was my repeated abuse of \only instead of using an onlyenv environment. Fixing the latex code helps.

The --save-image-file option will produce a temp folder with a big .tex file with one page for every "externalized" image. I don't think I still have the script, but I was able to split this file into one .tex chunk per image. I now have one big folder of standalone .tex sources for every tikz image, and a Makefile to generate SVG/PNG/webp images from them, this is great. If I ever decided to change fonts or colors of my template, I can regenerate all my images easily. I can still use tikz for my images. For tikz with beamer overlays, I use this to generate one image per overlay, which allows me to somewhat even keep my old animations. For some new images, I directly created new standalone tikz for my decker slides, without beamer.

decker-edu / decker

LaTeX beamer import (tips & tricks) #108