edrlab / thorium-reader

A cross platform desktop reading app, based on the Readium Desktop toolkit
https://www.edrlab.org/software/thorium-reader/
BSD 3-Clause "New" or "Revised" License
1.81k stars 154 forks source link

Incorrect rendering of fixed-layout documents with certain RTL languages #1987

Open initramfs opened 1 year ago

initramfs commented 1 year ago

For fixed layout documents, under RTL page progression order, if XHTML pages of a document contain an xml:lang property with a value that causes this function to return true (on the html tag), the viewport rendering of the document only renders a sliver of the right side (decreasing in width as you zoom out) of the page.

This issue can be encountered naturally/"in the wild" with EPUB3 fixed-layout comics in traditional Chinese (or otherwise have been tagged as such) that readily pass epubcheck against EPUB 3.3.

Tested under Thorium version 2.3.0 on Windows, Arch Linux, and Ubuntu (fresh installs in a VM).

Demonstration with a minimally reproducing epub (for sake of example, each page's xml:lang property was set to zh-TW):

Cover page (or any page with rendition:page-spread-center):

Thorium-cover-zh-TW

A landscape spread of 2 pages:

Thorium-spread-zh-TW

Comparison with correct rendering (either with no set xml:lang or xml:lang set to something that doesn't count as RTL):

Cover page (or any page with rendition:page-spread-center):

Thorium-cover-en-US

A landscape spread of 2 pages:

Thorium-spread-en-US


The relevant EPUB files are linked here: Sample-EPUBs.zip

danielweck commented 1 year ago

What a strange CSS / DOM bug, easily reproducible with your sample en-US EPUB too, by adding the HTML attribute dir="rtl" on the html element, or by adding direction: rtl; in the stylesheet via the web inspector / debugger (or simply by adding a style attribute to the root HTML element).

initramfs commented 3 months ago

What a strange CSS / DOM bug, easily reproducible with your sample en-US EPUB too, by adding the HTML attribute dir="rtl" on the html element, or by adding direction: rtl; in the stylesheet via the web inspector / debugger (or simply by adding a style attribute to the root HTML element).

This problem still persists today with electron 31 and version 3.0.0 of the reader. You mention the source of the issue being a "strange CSS / DOM bug", is there any additional insight as to what particular component is at fault here? I don't really do development in the languages used for this project but is there any additional debugging that I can do to assist in pinning down the source of the problem?

danielweck commented 3 months ago

hello, Thorium 3.0.0 ships with Electron 30, not 31. I wonder if this "bug" is reproducible outside of Thorium, just with Google Chrome (or any other Chromium-based web browsers) and an XHTML page that contains the minimal CSS styles that trigger the unexpected behaviour.

initramfs commented 3 months ago

Sorry for the slow response. Initially I only tested on my working machine running Arch Linux (in which Thorium is linked/launched against Electron 31, I'm not well versed with if electron has some sort of backwards compatibility mode). I have since tested in a fresh Windows 11 VM with Thorium 3.0.0 to the same effect.

I figured out how to inspect element by building a dev version, see my comment below.

Old Comment Contents ~~As for attempting to load the page with the relevant CSS stylesheet, I'm not quite sure how one would do that (I'm not familiar with the code base). I did find one CSS file at `src/resources/lib/pdfjs/web/viewer.css` so I did attempt to link that stylesheet into the cover page to test (I simply added `` to the `` element, didn't change anything else).~~ The page appears to load normally in both Google Chrome 126.0.6478.127 in the Win 11 VM: ![fxl-chrome-win-screenshot](https://github.com/user-attachments/assets/bc5e273e-fdd4-4e4d-85c4-946a36f154c5) And on Arch Linux under Chromium 126.0.6478.126: ![fxl-chromium-linux-screenshot](https://github.com/user-attachments/assets/86da485a-1653-409e-88b5-9740a0536594) I also validated that the stylesheet was being loaded in (without the stylesheet the image is not rescaled and appears large in the viewport with scrollbars as below (screenshot is cropped for size): ![fxl-cover-no-css-screenshot](https://github.com/user-attachments/assets/8adad1df-de7e-4198-b01e-5a907845afe3)
initramfs commented 3 months ago

I just realized that the RTL script directionality also affects text rendering with interesting results: Screenshot_20240717_230234 It gets worse with scripts rendered under writing-mode: vertical-rl (e.g. traditional mandarin chinese) but I assume the effects of having direction: rtl applied to the document were already clear before I noticed.


This is probably a separate issue but may I ask what is the source of the RTL scripts listed here? Because I can say that Han Chinese () is not RTL. Having lang === "zh-Hant" and lang === "zh-TW" listed under langStringIsRTL() seems incorrect regardless of the rendering issues present in this issue.

For a little bit more context: those characters may be rendered in a vertical top-to-bottom, columns right-to-left fashion (which then requires a RTL page progression order) or (for most scripts using those characters) a standard horizontal left-to-right, rows top-to-bottom fashion. Traditional books tend to follow vertical top-to-bottom, columns right-to-left whereas webpages or other free-form media prefer horizontal LTR rendering much like English.

Regardless of the vertical/horizontal orientation chosen, the script itself is still classified as left-to-right (given that under a horizontal context, it is rendered LTR).

initramfs commented 3 months ago

I finally figured out how to debug the program itself with element inspection. I noticed that the viewport scaling seems to be implemented via adding an inline stylesheet to the top-level <html> element via: transform-origin: 0px 0px; transform: scale(x); where x is the scale selected by the relevant zoom level. The transform origin seems to be set to 0px 0px (i.e. the top left corner) irrespective of the html dir attribute.

Manually adjusting the transform-origin to transform-origin: 100% 0% or transform-origin: right top when the html dir attribute is rtl seems to fix the image cutoff issue. Though I'm not fully sure if this is the correct thing to do in this case. Either way, the source of the cropping seems to be more clear now (incorrect transform origin).


Note: while this fixes the RTL image cutoff issue, chinese text is still being rendered wrong due to being marked as RTL.