edrlab / thorium-reader

A cross platform desktop reading app, based on the Readium Desktop toolkit
https://www.edrlab.org/software/thorium-reader/
BSD 3-Clause "New" or "Revised" License
1.83k stars 154 forks source link

i, em, and cite tagged text with CJK language value displays as normal text #2591

Closed Hopkins1 closed 1 week ago

Hopkins1 commented 1 month ago

Thorium Version: 3.0.0 Operating System: Windows 11

Chinese, Japanese, and Korean text does not display in italic font style.

For example with Japanese:

<p>The man drank a cup of <i lang="ja-Latn">sake</i></p>

The TTS audio pronounces "sake" as a two syllable word "saw kay" as expected, but the displayed text is not italic.

The same thing happens using a "span" tag:

<p>He likes to drink <span lang="ja-Latn"><em>Dassai</em> sake</span></p>

The "Latn" script field was used above, but the same problem occurs with just the bare language value (e.g."ja", "zh", "ko").

Hopkins1 commented 1 week ago

Ok, I see what is happening here. The Readium CSS (https://github.com/readium/readium-css) file "ReadiumCSS-before.css" defaults "cite", "dfn", "em" and "i" tags to "font-style:normal;" for Chinese, Japanese, and Korean text.

This makes sense for hanzi, kanji, kana, and hangul scripts. But it probably should not apply to Latin and Cyrillic scripts. The Readium CSS ignores the script portion of the language tag.

For now the crude work-around is to use a class selector:

em.italic{
  font-style:normal;
}

<p>He likes to drink <span lang="ja-Latn"><em class="italic">Dassai</em> sake</span></p>

I'll write an issue against Readium CSS. This Thorium issue can be closed.

danielweck commented 1 week ago

Hello @Hopkins1 thank you very much for reporting this problem and for investigating at your end!

@jaypanoz is the CSS codemaster who will be able to comment insightfully :)

See Readium (v1) code which Thorium uses:

https://github.com/readium/readium-css/blob/v.1.1.0/css/src/modules/ReadiumCSS-html5patch.css#L179-L186

JayPanoz commented 1 week ago

I should be able to exclude the Latin script in the snippet @danielweck referenced using :not() but are there others that should be as well ?

Hopkins1 commented 1 week ago

Looking at Wikipedia for CJK languages, it appears that this would be mostly limited to:

  1. Latin script (e.g. zh-Latn, ja-Latn, ko-Latn)
  2. Cyrillic script (e.g. zh-Cyrl, ja-Cyrl, ko-Cyrl).

Thanks

JayPanoz commented 1 week ago

Thanks!

Since this is a bug fix I will implement it in version 1 of ReadiumCSS so that everyone can benefit from it, then merge it into version 2 on develop.

@danielweck I can ping you in the PR so that you can see the code snippet if you wish. Just let me know you want to implement it in Thorium before updating to ReadiumCSS 2. 🙏