decker-edu / decker

A markdown based tool for slide deck creation.
GNU General Public License v3.0
53 stars 13 forks source link

LaTeX beamer import (tips & tricks) #108

Open kno10 opened 4 months ago

kno10 commented 4 months ago

This is essentially a "wontfix" issue, but as this is a frequent requested thing, I decided to document some of my findings here, so people can find them more easily. Feel free to contribute your experience and tricks!

Many STEM educators will have latex-beamer slides to convert. There is a number of options to convert LaTeX to HTML, but they tend to fail badly for latex-beamer slides, unfortunately. HTML could then probably be further converted with pandoc. Long story short, I ended up using plasTeX to convert several hundred beamer slides to Markdown, so this is possible (but requires scripting, hacking your tex code, etc.) See below for what could be converted, and what not.

Some options, sorry that I do not remember the exact drawbacks of each approach; feel free to contribute.

plasTeX by default produces HTML, and uses MathJax for equations. But since output is based on templates, we can also output using Markdown templates, so this can even yield a beamer-to-markdown conversion. When running this on my (large, complex) set of slides, I did run in a number of issues and bugs though. I opened several pull requests in the plastex github repository, some of them were merged, some not. Nevertheless, this was the most capable solution that I found.

Here is my markdown output branch: https://github.com/kno10/plastex/tree/markdown

Here are some hints I shared with colleagues before:


My branch with the markdown code (do not use main, check out the markdown branch).

https://github.com/kno10/plastex/tree/markdown

Instructions:

  1. setup and activate a Python virtual environment
  2. python3 setup.py install
  3. start conversion from your tex folder:
    plastex --renderer Markdown -d output-folder \
    --filename 'index [$id.md, sect$num(4)]' \
    --image-filenames 'index [$id/$id-$num(4), images/img-$num(4)]' \
    --xml --save-image-file \
    --image-scale-factor 1.0 --image-resolution 300 \
    plastex.tex

I suggest to first get it to run without images (to have some initial success, the images add another layer of complexity). If there is some parsing issues I found it useful to inspect the (invalid) XML to pin point which command breaks the parser. The image-scale-factor and image-resolution might need some increasing to get "optimal" automatic sizing in HTML (my exports are all a little small right now, and I have to manually add sizing to all images in the Markdown, as I first generated them for web). I think I ended up using inkscape (automatically via command line) and/or pdftocairo for pdf2svg conversion, IIRC because cropping and scaling worked best here.

The templates used are in the folder plasTeX/Renderers/Markdown/ (I rerun install every time when I change them, never bothered to look up how to use a local template, as I also had to change a lot of the .py files). The algorithm2e.jinja2s is an example that converts the (difficult) environment to an image.

Some changes I had to do to my source (I copied my main tex to plastex.tex, otherwise tried to keep changes minimal):

  1. set a blank beamer theme for image export

  2. disable some imports that were too difficult

  3. add a tex if for alternatives:

\newif\ifplastex\plastexfalse % always true in plastex, for switching

Then you can do \ifplastex simpler version \else regular version \fi, this also works for "commenting" some problematic slides from export.

  1. commands wrapped in \plastexpassthrough will end up in the .tex for generating the images, but not affect the markdown - I had to use this for example for my color definitions.

  2. in my plastex.tex, I override some commands I use that did not work well, e.g.,

    \def\newrefsection{} % biblatex only, not supported by plastex currently
    \providecommand{\diagbox}[1]{#1} % not compatible with plastex
    \renewcommand<>{\highlight}[2][structure]{{\textcolor#3{#1}{#2}}}
    \let\smallunderbrace\underbrace
    \renewcommand{\conclusion}{\textcolor{structure}{➜}\space}
    \renewcommand{\ExternalLink}{🔗}
    \renewcommand{\tmark}[1]{}% remember position, will not work for plastex
    \newcommand{\TD}{\ensuremath{\operatorname{TD}}}
    \def\fakeitem{\textcolor{structure}{➜}\space}
    \def\given{\vert}

You can see I used utf-8 characters lightly in the plastex.tex, as they work in plastex fine, instead of having to get the special characters differently for regular pdflatex.

  1. I had to move some local defined, colors, newcommands, graphicspath into the plastex.tex file so they end up being available in the image generation step. Fortunately, I did not have conflicting definitions/renewcommands as these might be harder to get working. The image generation flow is roughly to collect the tex commands up to the begin document as a preamble, then one page per image to be converted.

  2. I added a \label{} to every section, in order to control the file names, or the files will all be sect0123.md etc.

  3. For image export, I made the slides much larger to have excess whitespace to help SVG cropping via plastex.tex, while trying to preserve the inner dimensions to not break \textwidth and similar sizing too much. This fragment is a mess, but eventually I was okay with the results.

    % Increase page size, to improve image extraction
    \plastexpassthrough
    \geometry{papersize={27.78cm,20cm}, left=0cm, right=10cm, bottom=10cm, top=0cm}
    \makeatletter
    %% Margin in meinem Layout 0.65*7mm = 4.55mm
    \setlength\beamer@paperwidth {505.824pt} % 17.78 cm, 16:9
    \setlength\beamer@paperheight {10cm}
    \hsize=17.78cm\relax\vsize=10cm\relax
    \hoffset=5cm\relax\voffset=5cm\relax
    \def\pgfsys@thepageheight{10cm}%\the\vsize}
    \def\pgfsys@thepagewidth{17.78cm}%\the\hsize}
    \makeatother
    \setbeamersize{text margin left=0.455cm, text margin right=10.455cm}
    \setbeamertemplate{headline}{\rule{0pt}{16.55mm}} % 7mm logo + 2*4.55mm? + 5cm
    \setbeamertemplate{navigation symbols}{\rule{0pt}{10mm}} % line height + 4.55mm? + 5cm
    \setbeamertemplate{background canvas}{}% transparent PDFs, for cropping
    \endplastexpassthrough

You will likely need some more fiddling, unfortunately. But it is usually some corner cases. Sometimes it may be easiest to just skip the slide and convert it manually to save the effort.


Some additional hints:

The --save-image-file option will produce a temp folder with a big .tex file with one page for every "externalized" image. I don't think I still have the script, but I was able to split this file into one .tex chunk per image. I now have one big folder of standalone .tex sources for every tikz image, and a Makefile to generate SVG/PNG/webp images from them, this is great. If I ever decided to change fonts or colors of my template, I can regenerate all my images easily. I can still use tikz for my images. For tikz with beamer overlays, I use this to generate one image per overlay, which allows me to somewhat even keep my old animations. For some new images, I directly created new standalone tikz for my decker slides, without beamer.

cnroessl commented 4 months ago

Thanks for sharing!