Witiko / markdown

:notebook_with_decorative_cover: A package for converting and rendering markdown documents in TeX
http://ctan.org/pkg/markdown
LaTeX Project Public License v1.3c
326 stars 31 forks source link

Improve the speed of the Markdown package #474

Closed Witiko closed 3 weeks ago

Witiko commented 1 month ago

The current version of the Markdown package for TeX takes multiple seconds to initialize and process a markdown text:

$ docker run --rm -i witiko/markdown bash -c 'time markdown-cli <<< foo'
\markdownRendererDocumentBegin
foo\markdownRendererDocumentEnd

real    0m1.645s
user    0m1.430s
sys 0m0.215s

In a recent experiment, I processed a short text with historic versions of the Markdown package and I compared them with the current version of the Markdown package. The results show a more than 5× slow-down in version 3.4.3 of the Markdown package:

image

A PR that closes this ticket should take the following steps:

  1. Determine which of the eight PRs merged in version 3.4.3 caused the slow-down.
  2. Determine the exact cause of the slow-down and eliminate the slow-down.
  3. Test that processing a short text in the CI takes less than 1 second.
Witiko commented 1 month ago

In version 3.4.3, we upgraded to TeX Live 2024, which may be related to the slow-down. I will now run the same experiment using TeX Live 2022 for all versions of the Markdown package to control for this effect. With TeX Live, the potential sources of the slow-down would be the KPathSea library, which we use to locate external resources, and other Lua libraries that we use, which may have become slower in TeX Live 2024.

In commit efeaecbe3f584b9052061fc2949aee66572bfa07, I repeated the experiment using TeX Live 2022 for all versions of the Markdown package to control for this effect. The results show that the version of TeX Live is not a major factor (for markdown.lua) and version 3.4.3 still seems more than 5× slower than version 3.4.2.

Witiko commented 4 weeks ago
  1. Determine which of the eight PRs merged in version 3.4.3 caused the slow-down.

As discussed in https://github.com/Witiko/markdown/issues/458#issuecomment-2286231522, the issue is likely (also) with PRs https://github.com/Witiko/markdown/pull/416 and https://github.com/Witiko/markdown/pull/432, which started loading UnicodeData.txt and constructing a parser that recognizes all Unicode punctuation.

If this is the case, which is still to be determined, then pre-reading the file UnicodeData.txt in the CI and distributing a pre-compiled parser together with the rest of the Markdown package as a separate Lua file markdown-punctuation.lua would likely improve the speed and also make us independent on UnicodeData.txt. Furthermore, using a prefix tree to optimize the parser would further improve the speed and might close #458.

However, we may still wish to check if there is a more up-to-date version of UnicodeData.txt at runtime and, if there is, create a file markdown-punctuation.lua in the current working directory at runtime to override the outdated distribution file markdown-punctuation.lua.

Witiko commented 3 weeks ago

I continued the experiment to determine which of the eight PRs merged in version 3.4.3 caused the slow-down:

image

As assumed in https://github.com/Witiko/markdown/issues/474#issuecomment-2286251419, the more than 5× slow-down is caused by PRs https://github.com/Witiko/markdown/pull/416, which started loading UnicodeData.txt and constructing a parser that recognizes all Unicode punctuation.

The solution is to use a prefix tree to optimize the parser, as described in https://github.com/Witiko/markdown/issues/474#issuecomment-2286251419. Precompiling the parser may bring a further improvement and help us close ticket https://github.com/Witiko/markdown/issues/458 but will likely produce less improvement.

Witiko commented 3 weeks ago

In PR #482, the speed of the Markdown package has been significantly improved:

image

The speed improvement was achieved by using a prefix tree to construct a more efficient PEG parser of Unicode punctuation.

Many thanks to the contributor @Yggdrasil128 for their help with the fix!

Witiko commented 2 weeks ago

@Yggdrasil128: If you'd like, we have a Discord server and a space at Matrix.org. It can be faster to discuss the development of the Markdown package compare to GitHub.