gnosygnu / xowa

xowa offline wiki application
Other
374 stars 41 forks source link

Check |ws link in chapter= value #504

Closed desb42 closed 5 years ago

desb42 commented 5 years ago

Looking at en.wikipedia.org/wiki/Christmas#Further_reading

the last entry in Further_reading looks like eb The wikitext associated with this is:

{{cite EB1911|wstitle=Christmas |volume=6 |pages=293–294|short=1}}

From the entry taking the text [https%3A%2F%2Fen.wikisource.org%2Fwiki%2F1911_Encyclop%EF%EFdia_Britannica%2FChristmas "Christmas" ] and converting the %3A and %2F to : and / respectively it seems like it would produce the correct display

gnosygnu commented 5 years ago

Thanks for the break-down. This looks like a relatively simple change whereing I have to url-decode the link inside the [url caption]. Let me take a look at it for this weekend.

gnosygnu commented 5 years ago

Not as simple as expected, but still a simple change. In short, updated mw.uri.lua file which was changed a while back to handle uri.decode for wiki links

Tested with wikitext sample below


{{cite encyclopedia
|HIDE_PARAMETER1=
|HIDE_PARAMETER2=
|HIDE_PARAMETER3=
|HIDE_PARAMETER4=
|HIDE_PARAMETER5=
|HIDE_PARAMETER6=
|HIDE_PARAMETER7=
|HIDE_PARAMETER8=
|HIDE_PARAMETER9=
|HIDE_PARAMETER10=
|HIDE_PARAMETER11=
|HIDE_PARAMETER12=
|HIDE_PARAMETER13=
|HIDE_PARAMETER14a=
|HIDE_PARAMETER14ab=
|HIDE_PARAMETER14b=
|HIDE_PARAMETER14bb=
|HIDE_PARAMETER14c=
|HIDE_PARAMETER14cb=
|display-authors=
|HIDE_PARAMETER15chapter=
|editor-first=
|editor-last=
|encyclopedia=[[Encyclopædia Britannica Eleventh Edition|Encyclopædia Britannica]]
|title=[[Wikisource:1911 Encyclopædia Britannica/Christmas|Christmas]]
|url=
|accessdate=
|edition=11th
|year=1911
|publisher=
|volume=6
|page=
|pages=293294
|quote=
|ref=
|postscript=
|separator=
|mode=
|HIDE_PARAMETER20=
}}
desb42 commented 5 years ago

Great piece of work, almost Taking a look at the source of the link (I have added extra link breaks)

<li>
<img id="xoimg_41" alt="Wikisource" src="/fsys/file/commons.wikimedia.org/thumb/4/c/7/1/Wikisource-logo.svg/12px.png" width="12" height="13" />&#160;
<cite class="citation encyclopaedia">
<span class="cs1-ws-icon" title="Wikisource:1911 Encyclopædia Britannica/Christmas">
<a href="/en.wikisource.org/wiki/1911_Encyclop%EF%EFdia_Britannica/Christmas">"Christmas"&#160;</a>
</span>. <i><a href="/en.wikipedia.org/wiki/Encyclop%C3%A6dia_Britannica_Eleventh_Edition" id="xolnki_987" title="Encyclopædia Britannica Eleventh Edition">Encyclopædia Britannica</a></i>. 
<b>6</b> (11th ed.). 1911. pp.&#160;293–294.</cite>
<span title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&amp;rft.genre=bookitem&amp;rft.atitle=Christmas&amp;rft.btitle=Encyclop%EF%EFdia+Britannica&amp;rft.pages=293-294&amp;rft.edition=11th&amp;rft.date=1911&amp;rfr_id=info%3Asid%2Fen.wikipedia.org%3AChristmas" class="Z3988"></span></li>

Note that there are a number of occurances of Encyclop%EF%EFdia I think this should be Encyclop%C3%A6dia

gnosygnu commented 5 years ago

Ugh. This looks like it will be very hard -- perhaps impossible.

The problem is that Lua/C allows malformed Strings whereas Java forbids it. I don't know how to get Java to make a malformed string...

More detail below.


First, let's demonstrate the problem:

This means that the c in the wikiencode callback is actually being called twice:

The problem is that both the 1st and 2nd strings are invalid strings, since they represent two different parts of a UTF-encoding char: æ. And as far as I know, it is impossible to generate a string with one byte of 195 (or 166).[1] Instead, 195 gets converted to something like [-17, -65, -67]

Presumably, this works for MediaWiki because it's still using Lua C and C doesn't have as strict string rules

[1] (The furthest I got was this post: https://stackoverflow.com/a/12168695)

gnosygnu commented 5 years ago

This wasn't as bad as I feared, but the fix is still imperfect for the reasons described above. Luckily, Luaj's LuaString uses byte[] as its backing store and not String. As such, it only involved a one-line code-change, though it took much longer to come up with a unit-test.

Fixed with the commit above. Tested again with en.wikipedia.org/wiki/Christmas#Further_reading and hovered over the Wikisource Christmas entry.

Thanks again for keeping me honest!

gnosygnu commented 5 years ago

Marking item closed. Again, feel free to reopen. Thanks!