Closed BoboTiG closed 1 year ago
Here is the screenshot, I do not know yet why this renders badly.
It would be very cool to have a command to :
Ah yes! Like --gen-dict WORD [WORD...]
. :+100
See #666
Can you reproduce with this dictionary ? dicthtml-fr.zip
--gen-dict
is a killer feature, what an idea! :)
Same result with that dict.
The HTML looks good. I believe it's the italic. Freeserif doesn't have italic arabic or the webkit version on the Kobo doesn't handle italic correctly for these scripts... Not sure italic is really used in arabic or persian.
Good catch.
Yes, no arabic in freeserif italic https://fonts2u.com/free-serif-italic.font
I'm not sure if we can do much more. I couldn't find a unicode font with italic arabic.
If it was only "tasse" I would have changed the formatting on the Wiktionary. But I doubt it is the only one.
Can we find them somehow ? Look for ''+arabic letter
in the wikicode dump? or in render.py look for <i>+arabic
before writing the definitions?
<i>[^<]*([\u0627-\u064a]+)[^<]*</i>
Can probably be improved to capture the arabic word and move it out of <i></i>
...
I could detect a few: https://gist.github.com/lasconic/56762057597b1eaa8c0465ab89c4dc22
added the following in parse_word
in render.py
and ran --render
regex = r"<i>[^<]*([\u0627-\u064a]+)[^<]*</i>"
def check_arabic(definition: str):
matches = re.findall(regex, definition)
if matches:
print("####ERROR arabic in italic in definition :" + word, flush=True)
print(definition, flush=True)
print(matches, flush=True)
for definition in definitions:
if isinstance(definition, tuple):
for subdef in definition:
if isinstance(subdef, tuple):
for subsubdef in subdef:
check_arabic(subsubdef)
else:
check_arabic(subdef)
else:
check_arabic(definition)
if etymology:
matches = re.findall(regex, etymology)
if matches:
print("####ERROR arabic in italic in etymology :" + word, flush=True)
print(etymology, flush=True)
print(matches, flush=True)
Wikicode:
Here, arabic characters are well printed when they are handled by the
étyl
template. But when those characters are part of the "normal" text, their representation is broken. I mean:I will post a screenshot in coming days to demonstrate the issue.