Closed desb42 closed 5 years ago
Thanks for the break-down. This looks like a relatively simple change whereing I have to url-decode the link inside the [url caption]
. Let me take a look at it for this weekend.
Not as simple as expected, but still a simple change. In short, updated mw.uri.lua
file which was changed a while back to handle uri.decode for wiki links
Tested with wikitext sample below
{{cite encyclopedia
|HIDE_PARAMETER1=
|HIDE_PARAMETER2=
|HIDE_PARAMETER3=
|HIDE_PARAMETER4=
|HIDE_PARAMETER5=
|HIDE_PARAMETER6=
|HIDE_PARAMETER7=
|HIDE_PARAMETER8=
|HIDE_PARAMETER9=
|HIDE_PARAMETER10=
|HIDE_PARAMETER11=
|HIDE_PARAMETER12=
|HIDE_PARAMETER13=
|HIDE_PARAMETER14a=
|HIDE_PARAMETER14ab=
|HIDE_PARAMETER14b=
|HIDE_PARAMETER14bb=
|HIDE_PARAMETER14c=
|HIDE_PARAMETER14cb=
|display-authors=
|HIDE_PARAMETER15chapter=
|editor-first=
|editor-last=
|encyclopedia=[[Encyclopædia Britannica Eleventh Edition|Encyclopædia Britannica]]
|title=[[Wikisource:1911 Encyclopædia Britannica/Christmas|Christmas]]
|url=
|accessdate=
|edition=11th
|year=1911
|publisher=
|volume=6
|page=
|pages=293294
|quote=
|ref=
|postscript=
|separator=
|mode=
|HIDE_PARAMETER20=
}}
Great piece of work, almost Taking a look at the source of the link (I have added extra link breaks)
<li>
<img id="xoimg_41" alt="Wikisource" src="/fsys/file/commons.wikimedia.org/thumb/4/c/7/1/Wikisource-logo.svg/12px.png" width="12" height="13" /> 
<cite class="citation encyclopaedia">
<span class="cs1-ws-icon" title="Wikisource:1911 Encyclopædia Britannica/Christmas">
<a href="/en.wikisource.org/wiki/1911_Encyclop%EF%EFdia_Britannica/Christmas">"Christmas" </a>
</span>. <i><a href="/en.wikipedia.org/wiki/Encyclop%C3%A6dia_Britannica_Eleventh_Edition" id="xolnki_987" title="Encyclopædia Britannica Eleventh Edition">Encyclopædia Britannica</a></i>.
<b>6</b> (11th ed.). 1911. pp. 293–294.</cite>
<span title="ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.atitle=Christmas&rft.btitle=Encyclop%EF%EFdia+Britannica&rft.pages=293-294&rft.edition=11th&rft.date=1911&rfr_id=info%3Asid%2Fen.wikipedia.org%3AChristmas" class="Z3988"></span></li>
Note that there are a number of occurances of Encyclop%EF%EFdia
I think this should be Encyclop%C3%A6dia
Ugh. This looks like it will be very hard -- perhaps impossible.
The problem is that Lua/C allows malformed Strings whereas Java forbids it. I don't know how to get Java to make a malformed string...
More detail below.
First, let's demonstrate the problem:
=p.wikiencode("æ")
1:195;1:166;
This means that the c
in the wikiencode callback is actually being called twice:
The problem is that both the 1st and 2nd strings are invalid strings, since they represent two different parts of a UTF-encoding char: æ
. And as far as I know, it is impossible to generate a string with one byte of 195 (or 166).[1] Instead, 195 gets converted to something like [-17, -65, -67]
Presumably, this works for MediaWiki because it's still using Lua C and C doesn't have as strict string rules
[1] (The furthest I got was this post: https://stackoverflow.com/a/12168695)
This wasn't as bad as I feared, but the fix is still imperfect for the reasons described above. Luckily, Luaj's LuaString uses byte[]
as its backing store and not String
. As such, it only involved a one-line code-change, though it took much longer to come up with a unit-test.
Fixed with the commit above. Tested again with en.wikipedia.org/wiki/Christmas#Further_reading
and hovered over the Wikisource Christmas entry.
Thanks again for keeping me honest!
Marking item closed. Again, feel free to reopen. Thanks!
Looking at en.wikipedia.org/wiki/Christmas#Further_reading
the last entry in Further_reading looks like The wikitext associated with this is:
From the entry taking the text
[https%3A%2F%2Fen.wikisource.org%2Fwiki%2F1911_Encyclop%EF%EFdia_Britannica%2FChristmas "Christmas" ]
and converting the %3A and %2F to:
and/
respectively it seems like it would produce the correct display