attached text will hang Markdown().convert(contents)

GoogleCodeExporter commented 8 years ago

from markdown2 import *
Markdown().convert(open('minimal').read())

seems to hang for a long time.

(it's not actually minimized yet, sorry)

Original issue reported on code.google.com by david.as...@gmail.com on 1 Mar 2011 at 11:13

Attachments:

minimal

GoogleCodeExporter commented 8 years ago

confirmed the hang.

Original comment by tre...@gmail.com on 1 Mar 2011 at 11:40

Changed state: Accepted
Added labels: Priority-High
Removed labels: Priority-Medium

GoogleCodeExporter commented 8 years ago

Core problem is a pathologically slow regex looking for a possible "<hr>":

        re.compile(r"^[ ]{0,2}([ ]?\-[ ]?){3,}[ \t]*$", re.M)

Which with the "issue52_hang.text" input (recently commited, on Github) and 
David's input file is attempting to match against a string like below.

{{{
import re
r = re.compile('^[ ]{0,2}([ ]?\\-[ ]?){3,}[ \\t]*$', re.M)
text = '- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
+\n\nPrivacy Policy:  http://www.PetitionOnline.org/privacy-pets.html\n\n'
text = '- - - - - - - - - - - - - - - - - - - - - - - +\n\nfoo\n\n'
print(r.search(text))
}}}

This takes a looooong time... and increases exponentially? geometrically? badly 
then the number of "- " segments increases.

A possible secondary problem is that "+ - - - - - - ..." is being parsed as a 
listitem inside a list item inside a list item, etc. That seems unnecessary.

TODO:
- separate issue for the list item inside a list item thingy
- tighten-up test case for '<hr>' speed
- speed up '<hr>' match

Original comment by tre...@gmail.com on 7 Mar 2011 at 7:22

GoogleCodeExporter commented 8 years ago

Fixed in:

    [master 9e99850] Fix issue 52. Tweak silly nest li matching. See CHANGES.txt

on github.com/trentm/python-markdown2. See "slow_hr", "not_quite_a_list" and 
"hr_spaces" tests added around this.

Original comment by tre...@gmail.com on 10 Mar 2011 at 4:36

Changed state: Fixed

Smileyt / python-markdown2

attached text will hang Markdown().convert(contents) #52