earwig / mwparserfromhell

A Python parser for MediaWiki wikicode
https://mwparserfromhell.readthedocs.io/
MIT License
741 stars 74 forks source link

Remove whitespace as well when removing a node #269

Open RheingoldRiver opened 3 years ago

RheingoldRiver commented 3 years ago

Currently, when using code.remove(node), some whitespace (specifically new lines) is left behind, which can lead to display errors on the page in addition to extra whitespace on the source for editors to deal with.

Related: #268, #55, #265, #266

I think how it should behave (and how I've implemented it in my own tools with a workaround, see #268) is:

In the case of:

text here

{{MyTemplate}}

text here

aka, \n\n{{MyTemplate}}\n\n, after MyTemplate is removed, only \n\n should be left behind;

in the case of:

text here
{{MyTemplate}}
text here

aka, \n{{MyTemplate}}\n, after MyTemplate is removed, only one \n should be left behind;

finally, at the beginning of a document,

{{MyTemplate}}
text here

The \n should be removed;

and otherwise no whitespace is stripped.

lahwaacz commented 3 years ago

I have this already implemented: https://github.com/lahwaacz/wiki-scripts/blob/e459de07f303e87c28ef224c70d6bb02274cc9ca/ws/parser_helpers/wikicode.py#L60-L116

Relevant tests: https://github.com/lahwaacz/wiki-scripts/blob/e459de07f303e87c28ef224c70d6bb02274cc9ca/tests/parser_helpers/test_wikicode.py#L53-L176