5j9 / wikitextparser

A Python library to parse MediaWiki WikiText
GNU General Public License v3.0
289 stars 22 forks source link

Tag-finding is too permissive #121

Closed KennyChenBasis closed 1 year ago

KennyChenBasis commented 1 year ago

Using version 0.54.0, here's a minimal reproducible example:

import wikitextparser as wtp

text = """
<ref[oanda.com, March 9, 2022]/ref>
<ref name=cp/>
<ref>a</ref>
"""

parsed = wtp.parse(text)
for reference in parsed.get_tags('ref'):
    del reference[:]

(e.g. https://en.wikipedia.org/wiki/Economy_of_Tajikistan). It errors at

wikitextparser/_wikitext.py", line 157, in __add__
    raise DeadIndexError(
wikitextparser._wikitext.DeadIndexError: this usually means that the object has died (overwritten or deleted) and cannot be mutated

since the entire text is treated as a reference (resulting in nested references), even though <ref[oanda.com, March 9, 2022]/ref> is not a proper start tag. Instead, it should be left as text.

KennyChenBasis commented 1 year ago

Thanks for the quick fixes and release! I'll let you know if I find more things.

5j9 commented 1 year ago

Thank you for bug reports! Minimal examples really helped.