earwig / mwparserfromhell

A Python parser for MediaWiki wikicode
https://mwparserfromhell.readthedocs.io/
MIT License
741 stars 74 forks source link

Incorrect parsing of table and definition list #271

Open hauntsaninja opened 3 years ago

hauntsaninja commented 3 years ago

Thanks venturing into hell to bring us mwparser! I encountered an issue with tables and definition lists, hopefully the following makes it clear:

>>> import mwparserfromhell
>>> [x.tag for x in mwparserfromhell.parse('''{|
|-
| A
| B
|-
| C
| D
|}
''').filter_tags()]
['table', 'tr', 'td', 'td', 'tr', 'td', 'td']
>>> [x.tag for x in mwparserfromhell.parse(''':{|
|-
| A
| B
|-
| C
| D
|}
''').filter_tags()]
['dd']

You can see an example of this in Wikipedia at https://en.wikipedia.org/wiki/Eulerian_number (the second table in the "Basic properties" section)