Closed TrueBrain closed 1 year ago
Merging #78 into master will not change coverage. The diff coverage is
100.00%
.
@@ Coverage Diff @@
## master #78 +/- ##
=========================================
Coverage 100.00% 100.00%
=========================================
Files 31 31
Lines 4260 4263 +3
=========================================
+ Hits 4260 4263 +3
Impacted Files | Coverage Δ | |
---|---|---|
tests/test_parser_function.py | 100.00% <100.00%> (ø) |
|
wikitextparser/_spans.py | 100.00% <100.00%> (ø) |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact)
,ø = not affected
,? = missing data
Powered by Codecov. Last update a595b29...f2ccb33. Read the comment docs.
Well done tracking this one down.
This is what they call "Reverse pipe trick"1 and it is only valid in input wikitext. How Mediawiki will expands the [[|alt]]
syntax depends on the title of the page. For example if the title of the page is alt (something)
, then [[|alt]]
will be replaced with [[alt (something)|alt]]
. Since wikitextparser
currently does not know about the context, it cannot tell how exactly that syntax should be treated. However it still might be a good idea to detect it as a wikilink and let the user decide what to do with it.
A minor issue: now [[ |Alt]]
(with is a space between [[
and |
) will be detected as a wikilink, but mediawiki does not do that. Actually the [[{{{empty}}}|Alt]]
example that you gave above is of the same kind. Apparently mediawiki does the reverse pipe trick as a preprocessing step, before expanding other objects like templates or even comments. Therefore even [[<!-- -->|alt]]
is not detected as a wikilink. Maybe this issue is not that important... I'll need to think more about it and its use cases.
One thing I learnt in the last week or so, that I wish I never took the time to look into how meediawiki parses wikitext, and how much the specs are more "what-ever our code does" is :D
I found this page which contains a rant that says exactly how I feel: https://docs.rs/parse_wiki_text/0.1.5/parse_wiki_text/
And this information you just give, adds perfectly in that picture :) I am not even surprised :P
Honestly, the main problem for me is not so much that it is not a valid wikilink, but the fact the |
is seen as part of the parent it is in. In our wiki this is part of nearly every page:
{{#if:{{{en|}}}|<div (..) > (..) <br/>[[{{{en}}}|EN]]</div>|}}
(the (..)
are by me; it is a lengthy line). If I split it out a bit more for readability:
{{
#if:
{{{en|}}}
|
<div (..) > (..) <br/>[[{{{en}}}|EN]]</div>
|
}}
The problem is, that without acknowledging [[ ]]
as a wikilink, when en
is empty, |EN]]</div>
is seen as the else
part of the if
:
{{
#if:
|
<div (..) > (..) <br/>[[
|
EN]]</div>
|
}}
Which is something mediawiki does not do. It does seem to skip the pipe in the [[ ]]
even so it is not a valid wikilink (if that makes any sense?). If you know of another way to solve this problem, I would also be very happy. This was an means to an end, basically :) On the other hand, I am also perfectly fine if you close this PR unmerged, as it really only happens if you process wikitext, and feed that back to the parser again.
After some more consideration, and with the info given with you, I think we are better off closing this Pull Request. I now have added a simple piece of code after I process parameters:
if "[[|" in wikitext.string:
wikitext.string = re.sub(r'\[\[\|([^\]]*)\]\]', r'%5B%5B%7C\1%5D%5D', wikitext.string)
Which resolves my problem. It is a bit dirty, but it works out just fine :)
I leave it up to you if you want this Pull Request or not :)
I ended up merging this pull request in a02b9e96e . I decided that false positives are tolerable and can easily be ignored, but the given #if
example is much worse and can cause the parser function to be interpreted in a wrong manner.
Thank you so much for creating this pull request!
"[[|alt]]" is a valid wikilink, but WIKILINK_PARAMFINDITER did not consider it valid. In result, it was seens as a wikilink, but not marked as for, as example, a parser function to handle it. This meant a parser function saw "alt ]]" as another argument, where it really was not.
This was a very interesting one to track down.
VALID_TITLE_CHARS
is also used byPF_TL_FINDITER
, which now also has its behaviour changed. As I am not fluent in reading regex (is anyone really?), I am not sure what the side-effect of that is.Alternative, the
++
and*+
marker could be moved outside ofVALID_TITLE_CHARS
, keepingPF_TL_FINDITER
the same in behaviour. Let me know if you prefer this.Ironically, mediawiki has a bug here, where the given test-case works fine, but if the wikilink is
[[{{{empty}}}|Alt]]
it does fail.wikitextparser
, with this PR, does the right thing here too now.