earwig / mwparserfromhell

A Python parser for MediaWiki wikicode
https://mwparserfromhell.readthedocs.io/
MIT License
741 stars 75 forks source link

Template rejected by parser #24

Closed jfolz closed 11 years ago

jfolz commented 11 years ago

I've come across this little gem here:

{{Infobox Platz
| Name=Strausberger Platz
| Alternativnamen=
| Stadtwappen=Coat of arms of Berlin.svg
| Kategorie=Platz in Berlin
| Bild=Strausberger Platz Berlin April 2006 109.jpg|miniatur
| Bild zeigt=Der Platz in Richtung Westen gesehen
| Ort=Berlin
| Ortsteil=[[Berlin-Friedrichshain]]
| Angelegt=1967
| Neugestaltet=
| Straßen=<br />Lichtenberger Straße, [[Karl-Marx-Allee]]
| Bauwerke=„Haus Berlin“
| Nutzergruppen=[[Fußgänger]], [[Radfahrer]], [[Auto]]
| Platzgestaltung=
| Baukosten=
}}

It's from here: http://de.wikipedia.org/w/index.php?title=Strausberger_Platz&oldid=112496475

As soon as the line Bild=Strausberger Platz Berlin April 2006 109.jpg|miniatur is added to the template, it is rejected and a Text node is created instead. Even though this markup doesn't make sense, I would expect a Template node with a value-less parameter named "miniatur" instead of a Text node.

earwig commented 11 years ago

That, uh, shouldn't be happening at all. I'll look.

earwig commented 11 years ago

As expected, parsing happens correctly using the Python tokenizer. Looking into what's up with the C tokenizer.

Edit: Think I know what's going on. Edit 2: Solution ready; seems to work. Testing further.

earwig commented 11 years ago

Should be resolved now. As usual, the develop branch contains a work-in-progress C tokenizer that may be buggy and prone to these issues. Please report them whenever they come up!