gettalong / kramdown

kramdown is a fast, pure Ruby Markdown superset converter, using a strict syntax definition and supporting several common extensions.
http://kramdown.gettalong.org
Other
1.72k stars 271 forks source link

Kramdown freezes with specific input. #755

Open EnotPoloskun opened 2 years ago

EnotPoloskun commented 2 years ago

Hello. We have noticed that kramdown has hard time parsing some string.

Example:

text = "___QWE ___qwe ___qwe ___qwe ___qwe ___QWE ___qwe ___qwe ___qwe ___qwe ___QWE ___qwe ___qwe ___qwe ___qwe ___QWE ___qwe ___qwe ___qwe ___qwe ___QWE ___qwe ___qwe ___qwe ___qwe ___QWE ___qwe ___qwe ___qwe ___qwe ___QWE ___qwe ___qwe ___qwe ___qwe ___QWE ___qwe ___qwe ___qwe ___qwe ___QWE ___qwe ___qwe ___qwe ___qwe ___QWE ___qwe ___qwe ___qwe ___qwe ___QWE ___qwe ___qwe ___qwe ___qwe ___QWE ___qwe ___qwe ___qwe ___qwe ___QWE ___qwe ___qwe ___qwe ___qwe ___QWE ___qwe ___qwe ___qwe ___qwe ___QWE ___qwe ___qwe ___qwe ___qwe ___QWE ___qwe ___qwe ___qwe ___qwe ___QWE ___qwe ___qwe ___qwe ___qwe ___QWE ___qwe ___qwe ___qwe ___qwe ___QWE ___qwe ___qwe ___qwe ___qwe ___QWE ___qwe ___qwe ___qwe ___qwe"

Kramdown::Document.new(text)

takes around 30 second on my machine. And if I double string with same pattern - it never finishes. Is there a workaround or any option which would help processing this markdown?

Kramdown version - 2.4.0

gettalong commented 2 years ago

The reason for this is backtracking. For the first three underscores kramdown tries to find the matching pair. When encountering the second three underscores, it sees that those are not the matching ones. So it starts with those three to find their matching counterpart. And on, and on, and on it goes... until the end of the text where we see that nothing matches. So all nested matching attempts are abandoned and other things tried. Until we come to the first three underscores again. Then we see that they should be handled as normal text. After that we find the second three underscores and everything starts again...

I will have a look at the code.

EnotPoloskun commented 2 years ago
When encountering the second three underscores, it sees that those are not the matching ones

Why second three underscores are considered not "matching"? Is it intentional?

I have tried https://markdowntohtml.com/ for this text, and it considers them as matching if I understand correctly.

So If I understand correctly

___1 ___QWE ___2 ___

In this example 1 and 2 should be formatted.

gettalong commented 2 years ago

Underscores only match at word boundaries and the stopping delimiter must not be preceded by a space (see https://kramdown.gettalong.org/syntax.html#emphasis). So yes, that is intentional.