earwig / mwparserfromhell

A Python parser for MediaWiki wikicode
https://mwparserfromhell.readthedocs.io/
MIT License
741 stars 74 forks source link

Possible way to delete a node? #268

Closed RheingoldRiver closed 3 years ago

RheingoldRiver commented 3 years ago

Hi, one thing that comes up a lot is wanting to delete a template or a section, etc, completely from a page. What I would like is a method called .delete() that replaces the entire contents of the node with an empty string basically. Does some functionality for this already exist that I'm missing? Or would this be possible to add (either to the base node class, or to Template) ?

Right now what I do is remove every param, then change the name to @@@DELETE@@@ or something, then when I am done I replace all occurrences of {{@@@DELETE@@@}} in the entire page with an empty string, but this is pretty cumbersome.

earwig commented 3 years ago

code.remove(node)

E.g.:

>>> code = mwparserfromhell.parse("foo {{bar}} baz")
>>> node = code.filter_templates()[0]
>>> code.remove(node)
>>> code
'foo  baz'
earwig commented 3 years ago

BTW: Nodes don't know what they're contained in, so adding the method to the node itself isn't currently possible. This will change at some point in the future, then we could add a method to the node itself.

RheingoldRiver commented 3 years ago

ahhh perfect, thank you!

RheingoldRiver commented 3 years ago

Actually I tested it and it seems like code.remove(node) doesn't take with it, its own line if it was the only thing on the line. Would that be possible to do? (I can make a separate issue for that since that's a different feature request, if you think it's reasonable)

earwig commented 3 years ago

It's related to #55, #265, #266... there's a general problem here that the parser always treats whitespace as part of a text node and will never strip it for you. I think it's a reasonable feature request.