commonmark / cmark

CommonMark parsing and rendering library and program in C
Other
1.62k stars 539 forks source link

Quadratic behavior when parsing inlines #373

Closed nwellnhof closed 3 years ago

nwellnhof commented 3 years ago

Found by OSS-Fuzz: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=28805

Reduced test case:

python3 -c 'print("- "*7000+"s"+" -"*7000)' |build/src/cmark --smart >/dev/null
jgm commented 3 years ago

My tests:

N = time size of output
7000 0.715 168000
14000 2.63 336000
21000 6.32 504000

Confirming, then, that output size is linear with N, time is quadratic.

jgm commented 3 years ago

Interestingly, my Haskell commonmark library (which I had thought was implementing the same algorithm as cmark) is much faster for this case and has linear performance.

jgm commented 3 years ago

For reference, the pattern in the input is:

- - - - - - - s - - - - - - -

(for N = 7)

jgm commented 3 years ago

Puzzling. One would think this is due to nested list parsing, because if the pattern is changed by the addition of an s at the front, causing it to be a paragraph, parsing is nearly instant even for large N. But if the problem is list parsing, why on earth is --smart needed to trigger the slowdown?

nwellnhof commented 3 years ago

This can also be triggered without --smart. For example:

python3 -c 'print(">"*20000+"<"*20000)' |build/src/cmark >/dev/null