5j9 / wikitextparser

A Python library to parse MediaWiki WikiText
GNU General Public License v3.0
289 stars 22 forks source link

Handling newlines in the front and back of template arguments #111

Closed lihaohong6 closed 1 year ago

lihaohong6 commented 2 years ago

Running the following code snippet yields something unexpected

p = wikitextparser.parse("{{A|a=\n\nx\n\n|b=xxx}}")
print(p)
t = p.templates[0]
t.set_arg("a", t.get_arg("a").value)
print(p)

Since the second to last line sets an argument to itself, I was expecting that p stays the same in both print statements. However, 4 lines instead of 2 are padded to "x" both before and after it. I'm wondering if this is the expected behavior or a bug.

The workaround is quite simple since we just have to do t.get_arg("a").value.strip() to get rid of the extra lines, but I'm wondering if this can be fixed. Thank you!

One concern is that a fix might also break the behavior of previously written programs that rely on the fact that wikitextparser doubles newlines.

lihaohong6 commented 2 years ago

Also, I noticed Persian content in 5j9's other repositories, leading to the assumption that they currently reside in Iran, which explains 5j9's inactivity. If that is the case, hopefully 5j9 can stay safe and will be able to access GitHub soon.

5j9 commented 2 years ago

Template.set_arg has a preserve_spacing argument with a default value of True. It is most useful when editing a multi-line template/infobox because the user won't have to worry about alignment of parameters. Example:

p = wikitextparser.parse("{{A\n| a  =  a\n| b  =  b \n}}")
print(p)
t = p.templates[0]
t.set_arg("a", 'x')
print(p)
t.set_arg("a", 'x', preserve_spacing=False)
print(p)
{{A
| a  =  a
| b  =  b 
}}
{{A
| a  =  x
| b  =  b 
}}
{{A
| a  =x| b  =  b 
}}

However I do see how this feature had led to an unexpected result here. Perhaps it would be better to set the default value of preserve_spacing to False. To avoid a breaking change without deprecation, I'm considering deprecating this parameter and renaming it to preserve_whitespace or something else. In the meantime you can call set_arg(preserve_spacing=False).

(Sorry for the delay. Yup, internet access has been pretty much intermittent in the past few weeks... I'll try to take look at the other issues soon.)

lihaohong6 commented 2 years ago

Thank you! I was not aware that preserve_spacing exists. Please stay safe and I hope everything gets better!