fletcher / peg-multimarkdown

An implementation of MultiMarkdown in C, using a PEG grammar - a fork of jgm's peg-markdown. No longer under active development - see MMD 5.
Other
525 stars 55 forks source link

Simple text copied from Terminal window causes MultiMarkdown to take minutes to run #116

Closed bobgilmore closed 12 years ago

bobgilmore commented 12 years ago

Found on OS X Lion 10.7.3, 2.66 GHz Intel Core 2 Duo MBP. Originally reported against nvALT 2.2b (94), reproduced (at Brett's suggestion) with "raw" multimarkdown binary:

When copying several lines of interest from Terminal into nvALT with the preview window open, I found that having more than ~7 lines of the following format takes several minutes to parse in multimarkdown:

[(master $=)] [(master $=)] [(master $=)] [(master $=)] ...

I know that this is a pathological case, with lots of nested markdown, but hanging the process for several minutes makes me sad.

I've attached some timing measurements for various file lengths below...

For a foo.txt containing six such lines... before="$(date +%s)"; multimarkdown foo.txt ; after="$(date +%s)"; echo "$(expr $after - $before)"

fletcher commented 12 years ago

I'm not even sure what the proper output for this would be. Do the '' cause every other line to be partially output in italics? Does the parser keep looking for some reference named '(master $=)'?

Ideally, MMD would be able to ignore "garbage" files, but the problem is that it has know way of knowing in advance when a file is garbage. So it has to attempt to process the document to determine whether it's valid.

You can always indent the lines with a tab so that these lines are treated as a code block. Then processing is ok.

Otherwise, I'm not sure there is a way to fix this other than to abort processing.

bobgilmore commented 12 years ago

Hi Fletcher, I assumed that the current output for a small number of such lines (which completes in a short time) was good enough, or at least not clearly incorrect. For a (small) file foo.txt with N such lines, multi markdown currently emits

<p>N copies of the line</p>.

For example, for a file with three entries:

[bob@bobmac Documents]$ multimarkdown terminalprompts.txt 
<p>[(master *$=)][(master *$=)]
[(master *$=)]</p>

It wasn't the "quality" of the output that I was concerned with, but rather with the fact that trivial files (say, 10 lines) can hang multi markdown entirely.

But, as they say, garbage in, garbage out, so it's hard to fault the executable for choking on that input!

fletcher commented 12 years ago

If you carefully time files and increase the number of lines by each one, you'll see that the processing time is some sort of geometric ratio, growing by increasing increments with each additional line.

The only "fix" will be to try to detect long processing times and abort the command if this occurs. I want to add this, but I'm not sure how to do it in straight C yet.