Witiko / markdown

:notebook_with_decorative_cover: A package for converting and rendering markdown documents in TeX
http://ctan.org/pkg/markdown
LaTeX Project Public License v1.3c
331 stars 31 forks source link

Improve the speed of parsing markdown input five times #482

Closed Witiko closed 2 months ago

Witiko commented 3 months ago

This PR makes the following changes:

This change will close #474.

Following a successful CI run, this PR should repeat the experiment from https://github.com/Witiko/markdown/issues/474 to see if this has sufficiently improved speed the speed of the Markdown package. If so, then we should also make the following changes:

If not, this PR should continue by making the following changes:

These changes will close https://github.com/Witiko/markdown/issues/458 and continue https://github.com/Witiko/markdown/issues/402.

Yggdrasil128 commented 3 months ago

I took a closer look at the depth_first_search function, and while I can't pinpoint exactly where it goes wrong, I wrote a simpler, recursive alternative that seems to work for me.

local function depth_first_search(node, path, visit, leave)
  visit(node, path)
  for label, child in pairs(node) do
    if type(child) == "table" then
      depth_first_search(child, path .. label, visit, leave)
    else
      visit(child, path)
      leave(child, path)
    end
  end
  leave(node, path)
end

It needs to be called with an empty string as its second parameter: depth_first_search(prefix_tree, "", function ...

Since I can't edit this PR directly, feel free to commit this change, and let's see what the CI tests have to say about it.

Witiko commented 3 months ago

Thanks, simpler is better. I will test out your code on Monday at the latest. Perhaps we can get rid of the leave() call for leaf nodes as well, since we don't need it.

Witiko commented 3 months ago

Since I can't edit this PR directly, feel free to commit this change, and let's see what the CI tests have to say about it.

As far as I know, you can create your own fork of this repository and open a second-order PR that would merge into the branch feat/improve-speed from this PR.

However, you don't need to: With your permission, I will use your code from https://github.com/Witiko/markdown/pull/482#issuecomment-2293428688 and commit it with a by-line that states that it is your contribution. This way, GitHub will register the commit as your contribution as well.

Witiko commented 2 months ago

Following a successful CI run, this PR should repeat the experiment from #474 to see if this has sufficiently improved speed the speed of the Markdown package. If so, then we should also make the following changes:

  • Test that the markdown-cli command finishes under 1 second in the CI.

I verified on my Dell G5 15 notebook that after commit 6eff522 from this PR, the markdown-cli command finishes under 1 second:

$ docker run --rm -i ghcr.io/witiko/markdown:3.7.0-dev.1-7-g6eff5224-latest-minimal-no_docs bash -c 'time markdown-cli <<< foo'
\markdownRendererDocumentBegin
foo\markdownRendererDocumentEnd

real    0m0.258s
user    0m0.204s
sys 0m0.053s

In commit 3e8fe8ad, I added a test that the markdown-cli command finishes under 1 second to the CI, so that we can catch similar speed regressions in the future.

If not, this PR should continue by making the following changes:

  • Precompile parsers.punctuation to a separate Lua file.
  • Remove dependency on UnicodeData.txt.

These changes will close https://github.com/Witiko/markdown/issues/458 and continue https://github.com/Witiko/markdown/issues/402.

Since updating the parser has been sufficient for fixing the speed issue, I postponed the removal of the dependency on UnicodeData.txt to the next release, see also ticket https://github.com/Witiko/markdown/issues/486.