Closed RobSchoenaker closed 4 years ago
I've fed the text into ParserTracer
project in the repo. It seems that parser attempt to match (?i)</(li)(\s*)>
for 2245601 times.
And there are 196608 backtraces at the end of the following line. I think the parser is looking for the closing <li>
tag.
<li> [[Daniel Anrig|Daniel Rolf Anrig]] - SG (2008-2015) <-- here
I think this is related to missing </li>
tags in your given wikitext.
This is a pathological case. MW seems to complete the closing </li>
tag for you when render wikitext, and perhaps we need to take a similar behavior during parsing.
Parsed 892 characters in 8.316203 sec.
Regex Matching
Count Elpsd ms Average ms Expression
1 0.2494 0.2494 /?>|[\s=]
1 0.0018 0.0018 [>"]|/>
296303 297.6392 0.001 (?i)</(ol)(\s*)>
2245601 1798.4757 0.0008 (?i)</(li)(\s*)>
78288 42.6093 0.0005 \||\n|\[\[|\]\]
41424 23.3913 0.0005 \n|\[\[|\]\]
2838211 1127.1949 0.0003 \n
Fallbacks: 1499693
Count Position Text
196608 818 - SG (2008-2015)\r\n<li> [[Chri
196608 841 [[Christoph Graf]] - LU (vana
196608 860 - LU (vanaf 2015)\r\n</li></ol>
114688 779 <li> [[Daniel Anrig|Daniel Rol
98304 783 [[Daniel Anrig|Daniel Rolf An
98304 837 <li> [[Christoph Graf]] - LU (
65536 784 [[Daniel Anrig|Daniel Rolf Anr
65536 842 [[Christoph Graf]] - LU (vanaf
57344 736 <li> Elmar Theodor M?der - SG
49152 722 (1998-2002)\r\n<li> Elmar Theod
49152 740 Elmar Theodor M?der - SG (200
32768 786 Daniel Anrig|Daniel Rolf Anrig
32768 799 Daniel Rolf Anrig]] - SG (2008
32768 844 Christoph Graf]] - LU (vanaf 2
32767 885 </ol>\r\n
28672 672 <li> Pius Segmüller - [[Sankt
24576 653 - LU (1998-1998)\r\n<li> Pius S
24576 676 Pius Segmüller - [[Sankt Gall
16384 694 [[Sankt Gallen (kanton)|SG]] (
14336 629 <li> [[Alois Estermann]] - LU
12288 633 [[Alois Estermann]] - LU (199
8192 634 [[Alois Estermann]] - LU (1998
8192 696 Sankt Gallen (kanton)|SG]] (19
8192 718 SG]] (1998-2002)\r\n<li> Elmar T
7168 577 <li> Franz Pfyffer von Altisho
6144 581 Franz Pfyffer von Altishofen
4096 636 Alois Estermann]] - LU (1998-1
3584 539 <li> Robert Nünlist - LU (1957
3072 543 Robert Nünlist - LU (1957-197
1792 484 <li> Heinrich Pfyffer von Alti
1536 470 (1935-1942)\r\n<li> Heinrich Pf
1536 488 Heinrich Pfyffer von Altishof
896 411 <li> Georg von Sury d’Aspremon
768 397 (1921-1935)\r\n<li> Georg von S
768 415 Georg von Sury d’Aspremont -
512 445 [[Solothurn (kanton)|SO]] (193
448 356 <li> Alois Hirschbühl - [[Grau
384 342 (1910-1921)\r\n<li> Alois Hirsc
384 360 Alois Hirschbühl - [[Graubünd
256 380 [[Graubünden|GR]] (1921-1935)\r
256 447 Solothurn (kanton)|SO]] (1935-
256 466 SO]] (1935-1942)\r\n<li> Heinric
224 298 <li> Jules Repond - [[Fribourg
192 302 Jules Repond - [[Fribourg (ka
128 318 [[Fribourg (kanton)|FR]] (1910
128 382 Graubünden|GR]] (1921-1935)\r\n<
128 393 GR]] (1921-1935)\r\n<li> Georg v
112 246 <li> Leopold Meyer von Schauen
96 232 (1878-1901)\r\n<li> Leopold Mey
96 250 Leopold Meyer von Schauensee
Parsed tree
.Wikitext [<ol start="18">\r\n<li]
..Paragraph [<ol start="18">\r\n<li]
...HtmlTag [<ol start="18">\r\n<li]
....TagAttribute [ start="18"]
.....Run [start]
......PlainText [start]
.....Wikitext [18]
......Paragraph [18]
.......PlainText [18]
....Wikitext [\r\n<li> Martin Pfyffe]
.....Paragraph [\r\n<li> Martin Pfyffe]
......PlainText [\r\n<li> Martin Pfyffe]
......WikiLink [[[Wallis (kanton)|VS]
.......Run [Wallis (kanton)]
........PlainText [Wallis (kanton)]
.......Run [VS]
........PlainText [VS]
......PlainText [ (1878-1901)\r\n<li> L]
......WikiLink [[[Fribourg (kanton)|]
.......Run [Fribourg (kanton)]
........PlainText [Fribourg (kanton)]
.......Run [FR]
........PlainText [FR]
......PlainText [ (1910-1921)\r\n<li> A]
......WikiLink [[[Graubünden|GR]]]
.......Run [Graubünden]
........PlainText [Graubünden]
.......Run [GR]
........PlainText [GR]
......PlainText [ (1921-1935)\r\n<li> G]
......WikiLink [[[Solothurn (kanton)]
.......Run [Solothurn (kanton)]
........PlainText [Solothurn (kanton)]
.......Run [SO]
........PlainText [SO]
......PlainText [ (1935-1942)\r\n<li> H]
......WikiLink [[[Alois Estermann]]]
.......Run [Alois Estermann]
........PlainText [Alois Estermann]
......PlainText [ - LU (1998-1998)\r\n<]
......WikiLink [[[Sankt Gallen (kant]
.......Run [Sankt Gallen (kanton]
........PlainText [Sankt Gallen (kanton]
.......Run [SG]
........PlainText [SG]
......PlainText [ (1998-2002)\r\n<li> E]
......WikiLink [[[Daniel Anrig|Danie]
.......Run [Daniel Anrig]
........PlainText [Daniel Anrig]
.......Run [Daniel Rolf Anrig]
........PlainText [Daniel Rolf Anrig]
......PlainText [ - SG (2008-2015)\r\n]
......HtmlTag [<li> [[Christoph Gra]
.......Wikitext [ [[Christoph Graf]] ]
........Paragraph [ [[Christoph Graf]] ]
.........PlainText [ ]
.........WikiLink [[[Christoph Graf]]]
..........Run [Christoph Graf]
...........PlainText [Christoph Graf]
.........PlainText [ - LU (vanaf 2015)\r\n]
...PlainText [\r\n]
That's exactly what I thought. This is from an existing Wiki article. I understand it's not a normal case, but preventing CPU hogging would be OK her I suppose. Can be fixed?
There are actually two issues here
<li>
tag has special termination behavior. It can be terminated by </li>
, \n
<li>
opening tag.the parser will do a simple look-ahead before actually start parsing from an HTML tag. This will reduce the chances of falling back.
which means parser will look forward for
</li>
tag before parser consumes the first<li>
tag. However, this quick check fails to reduce backtracks in your given input, because there actually is a</li>
tag ahead.
So, what I can do here are
<li>
tag. (Not sure if there are other similar cases. I've found nothing valuable in Sanitizer.php or Parser.php.)<div>test
will be parsed as an HtmlTag
instead of PlainText
.Btw, I did a check on mwparserfromhell 0.5.4, it treats unclosed HTML tags as plaintext. So we probably can beat them by resolving this bug 😂
That looks like a very legitimate fix to me 👍
Fixed and published in v0.3.0-int.1
.
ParserTracer output
Parsed 887 characters in 0.0337111 sec.
Regex Matching
Count Elpsd ms Average ms Expression
1 0.2493 0.2493 /?>|[\s=]
39 0.1768 0.0045 (?i)</(ol)(\s*)>
6 0.0111 0.0018 \n|\[\[|\]\]
38 0.0491 0.0012 </li\s*>|<li(\s*>|\s+)
8 0.0101 0.0012 \||\n|\[\[|\]\]
1 0.0011 0.0011 [>"]|/>
120 0.0912 0.0007 \n
Fallbacks: 219
Count Position Text
6 4 start="18">\r\n<li> Martin Pfyff
6 11 18">\r\n<li> Martin Pfyffer von
6 15 \r\n<li> Martin Pfyffer von Alti
6 21 Martin Pfyffer von Altishofen
6 74 Franz Xaver Leopold Meyer von
6 138 Alfred von Sonnenberg - LU (1
6 183 Louis-Martin de Courten - [[W
6 232 (1878-1901)\r\n<li> Leopold Mey
6 250 Leopold Meyer von Schauensee
6 302 Jules Repond - [[Fribourg (ka
6 342 (1910-1921)\r\n<li> Alois Hirsc
6 360 Alois Hirschbühl - [[Graubünd
6 397 (1921-1935)\r\n<li> Georg von S
6 415 Georg von Sury d’Aspremont -
6 470 (1935-1942)\r\n<li> Heinrich Pf
6 488 Heinrich Pfyffer von Altishof
6 543 Robert Nünlist - LU (1957-197
6 581 Franz Pfyffer von Altishofen
6 633 [[Alois Estermann]] - LU (199
6 653 - LU (1998-1998)\r\n<li> Pius S
6 676 Pius Segmüller - [[Sankt Gall
6 722 (1998-2002)\r\n<li> Elmar Theod
6 740 Elmar Theodor M?der - SG (200
6 783 [[Daniel Anrig|Daniel Rolf An
6 818 - SG (2008-2015)\r\n<li> [[Chri
6 841 [[Christoph Graf]] - LU (vana
6 860 - LU (vanaf 2015)\r\n</ol>\r\n
6 885 \r\n
3 17 <li> Martin Pfyffer von Altish
2 0 <ol start="18">\r\n<li> Martin P
2 210 [[Wallis (kanton)|VS]] (1878-1
2 318 [[Fribourg (kanton)|FR]] (1910
2 380 [[Graubünden|GR]] (1921-1935)\r
2 445 [[Solothurn (kanton)|SO]] (193
2 634 [[Alois Estermann]] - LU (1998
2 694 [[Sankt Gallen (kanton)|SG]] (
2 784 [[Daniel Anrig|Daniel Rolf Anr
2 842 [[Christoph Graf]] - LU (vanaf
1 10 "18">\r\n<li> Martin Pfyffer von
1 70 <li> Franz Xaver Leopold Meyer
1 134 <li> Alfred von Sonnenberg - L
1 179 <li> Louis-Martin de Courten -
1 212 Wallis (kanton)|VS]] (1878-190
1 228 VS]] (1878-1901)\r\n<li> Leopold
1 246 <li> Leopold Meyer von Schauen
1 298 <li> Jules Repond - [[Fribourg
1 320 Fribourg (kanton)|FR]] (1910-1
1 338 FR]] (1910-1921)\r\n<li> Alois H
1 356 <li> Alois Hirschbühl - [[Grau
1 382 Graubünden|GR]] (1921-1935)\r\n<
Parsed tree
.Wikitext [<ol start="18">\r\n<li]
..Paragraph [<ol start="18">\r\n<li]
...HtmlTag [<ol start="18">\r\n<li]
....TagAttribute [ start="18"]
.....Run [start]
......PlainText [start]
.....Wikitext [18]
......Paragraph [18]
.......PlainText [18]
....Wikitext [\r\n<li> Martin Pfyffe]
.....Paragraph [\r\n<li> Martin Pfyffe]
......PlainText [\r\n]
......HtmlTag [<li> Martin Pfyffer ]
.......Wikitext [ Martin Pfyffer von ]
........Paragraph [ Martin Pfyffer von ]
.........PlainText [ Martin Pfyffer von ]
......HtmlTag [<li> Franz Xaver Leo]
.......Wikitext [ Franz Xaver Leopold]
........Paragraph [ Franz Xaver Leopold]
.........PlainText [ Franz Xaver Leopold]
......HtmlTag [<li> Alfred von Sonn]
.......Wikitext [ Alfred von Sonnenbe]
........Paragraph [ Alfred von Sonnenbe]
.........PlainText [ Alfred von Sonnenbe]
......HtmlTag [<li> Louis-Martin de]
.......Wikitext [ Louis-Martin de Cou]
........Paragraph [ Louis-Martin de Cou]
.........PlainText [ Louis-Martin de Cou]
.........WikiLink [[[Wallis (kanton)|VS]
..........Run [Wallis (kanton)]
...........PlainText [Wallis (kanton)]
..........Run [VS]
...........PlainText [VS]
.........PlainText [ (1878-1901)\r\n]
......HtmlTag [<li> Leopold Meyer v]
.......Wikitext [ Leopold Meyer von S]
........Paragraph [ Leopold Meyer von S]
.........PlainText [ Leopold Meyer von S]
......HtmlTag [<li> Jules Repond - ]
.......Wikitext [ Jules Repond - [[Fr]
........Paragraph [ Jules Repond - [[Fr]
.........PlainText [ Jules Repond - ]
.........WikiLink [[[Fribourg (kanton)|]
..........Run [Fribourg (kanton)]
...........PlainText [Fribourg (kanton)]
..........Run [FR]
...........PlainText [FR]
.........PlainText [ (1910-1921)\r\n]
......HtmlTag [<li> Alois Hirschbüh]
.......Wikitext [ Alois Hirschbühl - ]
........Paragraph [ Alois Hirschbühl - ]
.........PlainText [ Alois Hirschbühl - ]
.........WikiLink [[[Graubünden|GR]]]
..........Run [Graubünden]
...........PlainText [Graubünden]
..........Run [GR]
...........PlainText [GR]
.........PlainText [ (1921-1935)\r\n]
......HtmlTag [<li> Georg von Sury ]
.......Wikitext [ Georg von Sury d’As]
........Paragraph [ Georg von Sury d’As]
.........PlainText [ Georg von Sury d’As]
.........WikiLink [[[Solothurn (kanton)]
..........Run [Solothurn (kanton)]
...........PlainText [Solothurn (kanton)]
..........Run [SO]
...........PlainText [SO]
.........PlainText [ (1935-1942)\r\n]
......HtmlTag [<li> Heinrich Pfyffe]
.......Wikitext [ Heinrich Pfyffer vo]
........Paragraph [ Heinrich Pfyffer vo]
.........PlainText [ Heinrich Pfyffer vo]
......HtmlTag [<li> Robert Nünlist ]
.......Wikitext [ Robert Nünlist - LU]
........Paragraph [ Robert Nünlist - LU]
.........PlainText [ Robert Nünlist - LU]
......HtmlTag [<li> Franz Pfyffer v]
.......Wikitext [ Franz Pfyffer von A]
........Paragraph [ Franz Pfyffer von A]
.........PlainText [ Franz Pfyffer von A]
......HtmlTag [<li> [[Alois Esterma]
.......Wikitext [ [[Alois Estermann]]]
........Paragraph [ [[Alois Estermann]]]
.........PlainText [ ]
.........WikiLink [[[Alois Estermann]]]
..........Run [Alois Estermann]
...........PlainText [Alois Estermann]
.........PlainText [ - LU (1998-1998)\r\n]
......HtmlTag [<li> Pius Segmüller ]
.......Wikitext [ Pius Segmüller - [[]
........Paragraph [ Pius Segmüller - [[]
.........PlainText [ Pius Segmüller - ]
.........WikiLink [[[Sankt Gallen (kant]
..........Run [Sankt Gallen (kanton]
...........PlainText [Sankt Gallen (kanton]
..........Run [SG]
...........PlainText [SG]
.........PlainText [ (1998-2002)\r\n]
......HtmlTag [<li> Elmar Theodor M]
.......Wikitext [ Elmar Theodor M?der]
........Paragraph [ Elmar Theodor M?der]
.........PlainText [ Elmar Theodor M?der]
......HtmlTag [<li> [[Daniel Anrig|]
.......Wikitext [ [[Daniel Anrig|Dani]
........Paragraph [ [[Daniel Anrig|Dani]
.........PlainText [ ]
.........WikiLink [[[Daniel Anrig|Danie]
..........Run [Daniel Anrig]
...........PlainText [Daniel Anrig]
..........Run [Daniel Rolf Anrig]
...........PlainText [Daniel Rolf Anrig]
.........PlainText [ - SG (2008-2015)\r\n]
......HtmlTag [<li> [[Christoph Gra]
.......Wikitext [ [[Christoph Graf]] ]
........Paragraph [ [[Christoph Graf]] ]
.........PlainText [ ]
.........WikiLink [[[Christoph Graf]]]
..........Run [Christoph Graf]
...........PlainText [Christoph Graf]
.........PlainText [ - LU (vanaf 2015)\r\n]
...PlainText [\r\n]
WOW! That's quite an improvement. Will test tonight.
<ol start="18"> <li> Martin Pfyffer von Altishofen - LU (1835-1847) <li> Franz Xaver Leopold Meyer von Schauensee - LU (1848-1860) <li> Alfred von Sonnenberg - LU (1860-1878) <li> Louis-Martin de Courten - [[Wallis (kanton)|VS]] (1878-1901) <li> Leopold Meyer von Schauensee - LU (1901-1910) <li> Jules Repond - [[Fribourg (kanton)|FR]] (1910-1921) <li> Alois Hirschbühl - [[Graubünden|GR]] (1921-1935) <li> Georg von Sury d’Aspremont - [[Solothurn (kanton)|SO]] (1935-1942) <li> Heinrich Pfyffer von Altishofen - LU (1942-1957) <li> Robert Nünlist - LU (1957-1972) <li> Franz Pfyffer von Altishofen - LU (1972-1982) <li> [[Alois Estermann]] - LU (1998-1998) <li> Pius Segmüller - [[Sankt Gallen (kanton)|SG]] (1998-2002) <li> Elmar Theodor Mäder - SG (2002-2008) <li> [[Daniel Anrig|Daniel Rolf Anrig]] - SG (2008-2015) <li> [[Christoph Graf]] - LU (vanaf 2015) </li></ol>
This is an excerpt from a dutch Wikipedia entry. On my machine, this takes about 10 seconds to parse. It seems to be related to the number of <li> items. When I remove one, it's fast(er), but when I add one, it's time for coffee...