earwig / mwparserfromhell

A Python parser for MediaWiki wikicode
https://mwparserfromhell.readthedocs.io/
MIT License
741 stars 74 forks source link

template.name.matches for Scribunto modules #287

Open BryghtShadow opened 2 years ago

BryghtShadow commented 2 years ago

Would it be possible to add support for matching Scribunto modules that are invoked on a page?

import mwparserfromhell

code = mwparserfromhell.parse('{{Module:Foo}}')
template = code.filter_templates()[0]
print(template.name)  # 'Module:Foo'
name = str(template.name)
assert template.name.get(0) == name  # ok
assert template.name.matches(name)  # ok

code = mwparserfromhell.parse('{{#invoke:Foo|bar}}')
module = code.filter_templates()[0]
print(module.name)  # '#invoke:Foo'
name = str(module.name)
assert module.name.get(0) == name  # ok
assert module.name.matches(name)  # fail :(

The current workaround I'm using is to copy.deepcopy the name and replace the namespace in the copy:

from copy import deepcopy
import mwparserfromhell

wikitext = '{{#invoke:Foo|bar}}'

# option 1 ("safe")
code = mwparserfromhell.parse(wikitext)
module = code.filter_templates()[0]
if module.name.startswith('#invoke:'):
    name = deepcopy(module.name)
    name.replace('#invoke:', 'Module:') # rename the copy
    has_match = name.matches('Module:Foo')
    print(has_match)  # True
    assert str(module.name) == '#invoke:Foo'  # ok
    assert module.name.get(0) == '#invoke:Foo'  # ok, name is ['#invoke:Foo']

# option 2 ("destructive")
code = mwparserfromhell.parse(wikitext)
module = code.filter_templates()[0]
if module.name.startswith('#invoke:'):
    print(module.name.get(0))  # '#invoke:Foo'
    module.name.set(0, 'Module:Foo')  # temp rename (safe)
    has_match = module.name.matches('Module:Foo')
    module.name.replace('Module:', '#invoke:')  # undo (destructive, see assertion below)
    print(has_match)  # True
    assert str(module.name) == '#invoke:Foo'  # ok
    assert module.name.get(0) == '#invoke:Foo'  # fail, name is ['#', 'invoke:Foo']
earwig commented 2 years ago

Thanks for the bug report! (At the very least we should add a distinct node type for module invocations, in the same vein as parser functions, c.f. #10.)

To clarify, the expectation is that both of these would return True? Or only the second one?

code = mwparserfromhell.parse("{{#invoke:Foo}}")
t = code.filter_templates()[0]
t.name.matches("#invoke:Foo")
t.name.matches("Module:Foo")
BryghtShadow commented 2 years ago

The distinct node type would allow differentiation between {{Module:Foo}} and {{#invoke:Foo}}, I presume?

I'd expect just the second option to return True. It makes sense in terms of valid namespaces (#invoke: is not valid) and orthodox module usage (transclusions are unorthodox, and the node type should hopefully cover such use cases).