decaporg / decap-cms

A Git-based CMS for Static Site Generators
https://decapcms.org
MIT License
17.96k stars 3.05k forks source link

Current code in remarkShortcodes.js misunderstands how remark's block tokenizers are supposed to work #7315

Open manulari opened 3 weeks ago

manulari commented 3 weeks ago

Describe the bug

remarkShortcodes.js uses the blockTokenizers extension mechanism of remark-parse. The documentation for the last version of remark-parse which supported this feature can be found here. Quoting from there (emphasis mine):

Tokenizers test whether a document starts with a certain syntactic entity.

I understand this to mean that any registered block tokenizer (like the one registered by remarkShortcodes.js) is tried at the beginning of each "block" in a markdown document. (Probably roughly corresponding to blocks of lines separated by two consecutive newlines, but I couldn't find clear documentation on that.)

The current version of remarkShortcodes.js is a bit complex, but I think essentially it looks for matches to the shortcode patterns at all paragraphs in the remaining document passed in by remark and then chooses the earliest of those matches. This means that the bevaviour is both incorrect, and runtime is quadratic in the length of the source document (or at least the number of paragraphs). A previous version of remarkShortcodes.js had code that was more correct.

This currently shows up as the "Sent invalid data to remark"-warning message, which has been reported a couple of times. At least 1, 2, 3 are related, probably more.

This warning message is emitted by a try-catch block in remarkShortcodes.js that actually papers over the logic bug in remarkShortcodes.js. Specifically, this warning message will be emitted whenever there is a shortcode that matches somewhere in the document, but not at the current parsing position. The eat(match[0]) then causes remark to throw an error.

To Reproduce

Look at the source code for remarkShortcodes.js and the documentation for remark block tokenizers.

Expected behavior

I believe a more correct behaviour would be achieved in the following way:

Applicable Versions:

main branch