CXuesong / MwParserFromScratch

A basic .NET Library for parsing wikitext into AST.
Apache License 2.0
18 stars 5 forks source link

Assertion failure when parsing a pathological ref tag #4

Closed CXuesong closed 7 years ago

CXuesong commented 7 years ago

The following wikitext in a revision found on Warriors Wiki fails the assertion

:[[Molewhisker (TC)|Molewhisker]]:<ref name="Vicky's Facebook"/ {{Status|Molewhisker (TC)}}
:[[Featherkit (TPB)|Featherkit]]:{{r|os5|81}} {{Status|Featherkit (TPB)}}
:[[Cricketkit]]:{{r|os5|81}}<ref name=dapple>Revealed on [https://www.facebook.com/permalink.php?story_fbid=10154881098302454&id=29566467453 Vicky's Facebook]</ref> {{Status|Cricketkit}}

Failed assertion here

CXuesong commented 7 years ago

The situation can be simplified as

<ref aaaaaaaaa<ref>abcdef</ref>

MediaWiki will render it as if

<ref>abcdef</ref>

We want to keep as much information as we can, so just fail the Tag parsing attempt, and <ref aaaaaaaaa will be kept as plain text.

CXuesong commented 7 years ago

Note that for

<ref name=<abc>abcdef</abc>>text</ref>

MediaWiki will render it as if

<ref name="<abc">abcdef</abc>>text</ref>

The point is that, > is always the terminator of an opening tag.

CXuesong commented 7 years ago

Okay, now the simplified test case yields the following AST

Wikitext             [<ref aaaaaaaaa<ref>a]
.Paragraph           [<ref aaaaaaaaa<ref>a]
..PlainText          [<ref aaaaaaaaa]
..ParserTag          [<ref>abcdef</ref>]
..PlainText          [\r\n]

And the original wikitext is parsed as

Wikitext             [:[[Molewhisker (TC)|]
.ListItem            [:[[Molewhisker (TC)|]
..WikiLink           [[[Molewhisker (TC)|M]
...Run               [Molewhisker (TC)]
....PlainText        [Molewhisker (TC)]
...Run               [Molewhisker]
....PlainText        [Molewhisker]
..PlainText          [:<ref name="Vicky's ]
..Template           [{{Status|Molewhisker]
...Run               [Status]
....PlainText        [Status]
...TemplateArgument  [Molewhisker (TC)]
....Wikitext         [Molewhisker (TC)]
.....Paragraph       [Molewhisker (TC)]
......PlainText      [Molewhisker (TC)]
..PlainText          [\r]
.ListItem            [:[[Featherkit (TPB)|]
..WikiLink           [[[Featherkit (TPB)|F]
...Run               [Featherkit (TPB)]
....PlainText        [Featherkit (TPB)]
...Run               [Featherkit]
....PlainText        [Featherkit]
..PlainText          [:]
..Template           [{{r|os5|81}}]
...Run               [r]
....PlainText        [r]
...TemplateArgument  [os5]
....Wikitext         [os5]
.....Paragraph       [os5]
......PlainText      [os5]
...TemplateArgument  [81]
....Wikitext         [81]
.....Paragraph       [81]
......PlainText      [81]
..PlainText          [ ]
..Template           [{{Status|Featherkit ]
...Run               [Status]
....PlainText        [Status]
...TemplateArgument  [Featherkit (TPB)]
....Wikitext         [Featherkit (TPB)]
.....Paragraph       [Featherkit (TPB)]
......PlainText      [Featherkit (TPB)]
..PlainText          [\r]
.ListItem            [:[[Cricketkit]]:{{r|]
..WikiLink           [[[Cricketkit]]]
...Run               [Cricketkit]
....PlainText        [Cricketkit]
..PlainText          [:]
..Template           [{{r|os5|81}}]
...Run               [r]
....PlainText        [r]
...TemplateArgument  [os5]
....Wikitext         [os5]
.....Paragraph       [os5]
......PlainText      [os5]
...TemplateArgument  [81]
....Wikitext         [81]
.....Paragraph       [81]
......PlainText      [81]
..ParserTag          [<ref name=dapple>Rev]
...TagAttribute      [ name=dapple]
....Run              [name]
.....PlainText       [name]
....Wikitext         [dapple]
.....Paragraph       [dapple]
......PlainText      [dapple]
..PlainText          [ ]
..Template           [{{Status|Cricketkit}]
...Run               [Status]
....PlainText        [Status]
...TemplateArgument  [Cricketkit]
....Wikitext         [Cricketkit]
.....Paragraph       [Cricketkit]
......PlainText      [Cricketkit]
..PlainText          [\r]
.Paragraph           []