CXuesong / MwParserFromScratch

A basic .NET Library for parsing wikitext into AST.
Apache License 2.0
18 stars 5 forks source link

Parsing Variables and Parser Functions #5

Closed CXuesong closed 7 years ago

CXuesong commented 7 years ago

See: https://www.mediawiki.org/wiki/Help:Magic_words.

In this issue we will discuss and address the parsing of Variables and Parser Functions.

For now, Behavior switches are treated as plain-text.

CXuesong commented 7 years ago

Variables and Parser Functions share the same basic traits with Templates, except that they use colon to indicate the beginning of the first argument, such as

{{#if:expr | truecase | falsecase}}

Considering the similarity described above, we might treat Variables and Parser Functions as a special case of Template node. To achieve this, a new property (e.g. IsMagicWord) should be introduced to distinguish the node type.

We might treat Variables and Parser Functions as same kind of nodes, because we are only interested in whether the first argument is started with colon rather than pipe.

CXuesong commented 7 years ago

For WikitextParser, we will parse the Magic Words optimistically.

If the template name of a "template" formation is started with #, then the parser will try to parse it as a Parser Function, because something like {{ #blah blah blah}} just cannot be a template transclusion.

If the template name matches any of the well-known variable names, the formation will try to parse it as a Variable.

Note that for the following case, the formation should be treated either as ordinary template, or plain-text

{{#if|xyz}}

And for the following case… Uhmm… Relax, because there's nothing we can do…

{{ {{{something|#}}}if : expr | truecase}}

It will be parsed as an ordinaty Template node, while it actually not.

Btw, it seems that Parser Functions are case-insensitive, while Variables are not.

CXuesong commented 7 years ago

An interesting fact: https://www.mediawiki.org/wiki/Manual:Parser_functions#Named_parameters

Parser functions do not support named parameters the way templates and tag extensions do, but it is occasionally useful to fake it. Users are often accustomed to using vertical bars ( | ) to separate arguments, so it's nice to be able to do that in the parser function context, too.

Actually, the source code shows that "variables" such as {{NUMBEROFUSERS}} are case-sensitive, no hash (#) parser functions.