Closed lasconic closed 3 years ago
Should we handle it instead? It seems to be pictures.
What do you mean ? convert the pictures to GIF and embed like we do for math ?
I did not have a look at the PHP file that is handling the template. But I guess it is "only" a bunch of files referenced by a key (here "R11"). IF it is that, we could handle it and use inline GIF as we do for math and chem, yes.
It seems more like several GIFs for "Ptah". I do not know if it is worth handling the template. Let me know your thoughts :)
It's a bit more complicated than just one GIF indeed. The extension outputs an HTML table and is able to put symbols on top of each other. To know if it's worth the pain..., I checked how many time hiero is used in the wikicode we currently render. In french, in 13 words (on 1,555,588)...
'Sekhmet'
'Apophis'
'Aton'
'Néfertiti'
'Pharaon'
'Ptah'
'Ramsès'
'djed'
'gomme'
'khépesh'
'oasis'
'ouchebti'
'uraeus'
63 in english, on 677,008 words
'barge'
'barque'
'basalt'
'Hathor'
'Hatshepsut'
'Hatti'
'Moab'
'Ab'
'Set'
'Shemu'
'Neith'
'Nephthys'
'Akhenaten'
'Akhet'
'Sobek'
'Anubis'
'Anuket'
'Sphinx'
'Onuphrius'
'Sutekh'
'Aswan'
'Imhotep'
'Thoth'
'Peret'
'Isis'
'Djahy'
'Jerusalem'
'Tutankhamon'
'Tutankhaten'
'Tybi'
'Unas'
'adobe'
'Wadjet'
'ba'
'Wenis'
'Punt'
'alphabet'
'Ra'
'ammonia'
'Re'
'Retjenu'
'ankh'
'ebony'
'Maat'
'emerald'
'ibis'
'life, prosperity, health'
'lightland'
'lily'
'heqat'
'hieroglyph'
'hin'
'natron'
'oasis'
'plewd'
'sphinx'
'senet'
'tjaty'
'serekh'
'stibium'
'uraeus'
'ushabti'
'trona'
Could be worth it, especially if most of them are sequential and "simple"...
For french, here are the code.
S42-G17*X1-I12
O29 Q3:Q3 I14
i-t:n-N5
pr:aA
Q3:X1-V28-C19
ra:Z1-ms-s-sw
R11
N29-W19-M17-M17*X1-N33:Z2
Aa1:Q3-N37:F23-F51
Aa2-X1:N25
w-S-b-t:y-A53
I12
Some are simple like R11, but most of them contains * or : ... and it's less simple and would require a table or some css...
Convert the PNG in GIF and store base64 in a map. Resulting file is 655KB.
import os
from PIL import Image
from io import BytesIO
from base64 import b64encode
files = os.listdir(".")
results = {}
for f in files:
if f.endswith(".png"):
code = f.split("_", 1)[1].split(".")[0]
png = Image.open(f)
im = BytesIO()
png.convert("L").save(im, format="gif", optimize=True)
im.seek(0)
raw = im.read()
results[code] = f'<img src="data:image/gif;base64,{b64encode(raw).decode()}"/>'
print("hiero = {")
for t, r in sorted(results.items()):
print(f' "{t}": \'{r}\',')
print(f"}} # {len(results):,}")
In short, we probably need to reproduce the whole PHP scripts to have a decent support.
In particular the tokenizer, https://github.com/wikimedia/mediawiki-extensions-wikihiero/blob/366b1226891e609650b4c7f7d925b718c779517c/includes/HieroTokenizer.php and the render function at https://github.com/wikimedia/mediawiki-extensions-wikihiero/blob/366b1226891e609650b4c7f7d925b718c779517c/includes/WikiHiero.php#L259
Also some hiero code uses phonemes and not the code used in the PNG filename. So we need a copy of https://github.com/wikimedia/mediawiki-extensions-wikihiero/blob/366b1226891e609650b4c7f7d925b718c779517c/includes/WikiHiero.php#L259
It will be hard to unit test the output, since it's only img tag with base64 and a bunch of HTML...
A bit too much for a sunday :)
Clearly too much, yes :)
Thanks for the analysis and pre-work ;)
Nice one!
I was wondering what do you think about your patch? Worth giving a try on my side?
It's kind of linked with the HTML table one https://github.com/BoboTiG/ebook-reader-dict/issues/1024, since table support is needed. So I would tackle HTML table first to get some info on how well it works on kobo before tackling this one.
Attached a dictionary containing the french words with hiero from https://github.com/BoboTiG/ebook-reader-dict/issues/703#issuecomment-778651324
C'est propre !
I think the cell width should be adapted to the picture width it contain. See https://fr.wiktionary.org/wiki/Rams%C3%A8s for example:
But we can live as-is :+1:
https://fr.wiktionary.org/wiki/Sekhmet is not really well displayed too.
Yes, I feel like I'm pushing the limit of the HTML renderer on the Kobo... Here is Sekhmet in Chrome (rendered bigger to be the right size on Kobo...) Somehow the styling in the Kobo browser is not the same... (do we know which renderer it is ? Probably webkit, but which version ?) Maybe it's not the browser but a default CSS applied to table... Any idea if we can see this CSS somewhere ?
and Ramsès
I could go up to https://github.com/kobolabs/qt-everywhere-opensource-src-4.6.2/blob/master/src/3rdparty/webkit/VERSION to find the WebKit version, but the hash is not helpfull (69dd29fbeb12d076741dce70ac6bc155101ccd6f
, I could not find it). Given the [changelog](), it is an old one from 2009-11-30. That mirror has a history until 2012 only.
And I am not sure about those information, I got the 4.6.2 version of Qt Embedded from the latest Kobo firmware (https://kbdownload1-a.akamaihd.net/firmwares/kobo7/Feb2021/kobo-update-4.26.16704.zip), so it should be right.
Ok, so if they use webkit to do dictionary rendering, it's the one included in Qt 4.6.2.
I investigated the style... I believe I found the problem for Ramsès, not yet for Sekhmet
New french dictionary: dicthtml-fr.zip
About the default CSS, I cannot say it is used in the dictionary area though:
* {padding: 0; margin: 0; }
body { font: %1px %2; }
table, thead, tbody, tr, td, th { font-size: inherit; font-family: inherit; }
(still looking for more data)
Interesting page for testing : https://fr.wikipedia.org/wiki/Wikip%C3%A9dia:WikiHiero/Exemples
The new version is way better :muscle: The rendering is great!
https://fr.wiktionary.org/wiki/Aton needs more space in column 2. Maybe it is a vertical alignment issue like for Sekhmet. https://fr.wiktionary.org/wiki/Ptah and https://fr.wiktionary.org/wiki/gomme also.
Wikicode:
Output:
Expected:
Model link, if any: https://www.mediawiki.org/wiki/Extension:WikiHiero https://www.mediawiki.org/wiki/Special:MyLanguage/Extension:WikiHiero/Syntax https://github.com/wikimedia/mediawiki-extensions-wikihiero/blob/366b1226891e609650b4c7f7d925b718c779517c/includes/WikiHiero.php