5j9 / wikitextparser

A Python library to parse MediaWiki WikiText
GNU General Public License v3.0
285 stars 22 forks source link

{{snd}} renders as empty string in plain_text() method #103

Closed FTHistorian closed 2 years ago

FTHistorian commented 2 years ago

https://en.wikipedia.org/wiki/Template:Spaced_en_dash indicates this should render as " - " (without the quotes).
Is there a way of adding this in so the plain_text() method replaces the template with " - " ?

Many thanks

Tim

5j9 commented 2 years ago

I'm afraid I can't currently think of any easy way to do this. So, one has to either manually replace the templates in the string (using a regex or str.replace) or loop over templates and modify them.

I could make replace_templates parameter of plain_text method accept a function taking template objects and returning the replacement string... Would that be useful? I'm also open to other ideas.

5j9 commented 2 years ago

OK, this feature looked simple enough to implement, so I went ahead and added this to v0.49.0.

Here is a working sample if you decide to use this method:

from wikitextparser import parse, Template

def template_mapper(template: Template):
    if template.normal_name() in {'dash', 'snd', 'spnd', 'sndash', 'spndash', 'spaced en dash'}:
        return ' –'  #  –
    return ''  # remove other templates

wikitext = '[[Salt]]{{spaced en dash}} [[Pepper]]'

plain_text = parse(wikitext).plain_text(replace_templates=template_mapper)
print(plain_text)  # "Salt – Pepper"
FTHistorian commented 2 years ago

Perfect! That is just the job and does exactly what I was after. A nice elegant solution that is easy to customise if required. I noticed that some pages use {{spaced ndash}} which wasn't mentioned in the link I put in my comment but your solution made it very simple to add it to the list.

Many thanks!

Tim