earwig / mwparserfromhell

A Python parser for MediaWiki wikicode
https://mwparserfromhell.readthedocs.io/
MIT License
757 stars 75 forks source link

Option to remove comments from template values #231

Open RheingoldRiver opened 4 years ago

RheingoldRiver commented 4 years ago

I'm not sure the best way this could be implemented, but in the past I've run into issues where a template value has a comment in it, and I want to compare just the value, ignoring the comment. My solution has been to do replacement myself to ignore comments, but it would be really nice if the library handled this.

Maybe an option relating to comments on the initial parse? To strip out all comments / attempt to ignore but leave in place / not treat comments specially - the first use case would be used when there's documenting comments in template preloads that can get deleted once data has been added, the second case would probably be what I'd default to, where it attempts to preserve comments the way it attempts to preserver whitespace; and then the fallback of not treating them specially could be used when there's complicated enough setups that the middle option is unable to work properly.

Would something like this be possible? Thanks!

lahwaacz commented 4 years ago

There is a more general issue with the abstract tree traversal and replacement (with an empty string in your case) - see #195.

earwig commented 4 years ago

I don't generally like adding options to the initial parse (there is only one right now, and it's there as a bug workaround), so if the existing behavior is insufficient, my preference would be for an easier way to express this in the wikicode object. Two ideas come to mind:

RheingoldRiver commented 4 years ago

Hmm, yeah I think having matches ignore comments and work on param names/values as well would be sufficient to fix every issue I had - would that work?

earwig commented 4 years ago

Hold on, did you mean Wikicode.matches or am I getting mixed up? That should already ignore comments.

RheingoldRiver commented 4 years ago

Oh, I didn't realize matches already ignores comments. In that case, when I want to ignore comments I'll use matches method. Can you add a note that it ignores comments to the docstring?

earwig commented 4 years ago

Thanks, I'll update it. (I hoped it was made clear by "Specifically, whitespace and markup is stripped" but I can understand how that is ambiguous.)