I tried wikitextparser on a recent wikinews dump and plain_text() fails for a few pages, while mwparserfromhell's strip_code() works on all pages.
Here is an example to reproduce the error:
import pywikibot
import wikitextparser
en_wikinews = pywikibot.Site('en', 'wikinews')
text = pywikibot.Page(en_wikinews,"NASCAR's Earnhardt Jr Signs 5-year Contract with Hendrick Motorsports").get()
print(wikitextparser.parse(text).plain_text())
Using pywikibot (5.6.0) and wikitextparser 0.47.0 - I'm getting:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "~/venv/lib/python3.7/site-packages/wikitextparser/_wikitext.py", line 607, in plain_text
for i in parsed.get_bolds_and_italics():
File "~/venv/lib/python3.7/site-packages/wikitextparser/_wikitext.py", line 987, in get_bolds_and_italics
self._bolds_italics_recurse(result, filter_cls)
File "~/venv/lib/python3.7/site-packages/wikitextparser/_wikitext.py", line 945, in _bolds_italics_recurse
filter_cls=filter_cls, recursive=False):
File "~/venv/lib/python3.7/site-packages/wikitextparser/_wikitext.py", line 970, in get_bolds_and_italics
rs, re = self._relative_contents_end
File "~/venv/lib/python3.7/site-packages/wikitextparser/_tag.py", line 208, in _relative_contents_end
return self._match.span('contents')
AttributeError: 'NoneType' object has no attribute 'span'
I tried wikitextparser on a recent wikinews dump and plain_text() fails for a few pages, while mwparserfromhell's strip_code() works on all pages.
Here is an example to reproduce the error:
Using
pywikibot
(5.6.0) andwikitextparser
0.47.0 - I'm getting: