edrlab / thorium-reader

A cross platform desktop reading app, based on the Readium Desktop toolkit
https://www.edrlab.org/software/thorium-reader/
BSD 3-Clause "New" or "Revised" License
1.85k stars 157 forks source link

Get page number from an ePub #1662

Closed spudthebud closed 2 years ago

spudthebud commented 2 years ago

I notice some odd behaviour getting page numbers from an ePub that had pagination.

Context

Thorium 1.8 with JAWS, I am sighted and JAWS certified.

Fictitious user story to illustrate the situation:

I am a faculty member at a research university and am preparing a grant application. I am quoting an ePub for which there is a print-book equivalent which has page numbers. I would like to get the page number I am on so I can integrate a direct quote into the application. I am currently on the last word that I will be quoting. From here, I press Control + Shift + P, which opens a Thorium navigation toggle button and the keyboard focus is put into an edit. I hear the current page number. I press ESC to leave the navigation area. But the JAWS Virtual PC cursor is no longer in the text where I was when I had pressed the keys. The focus is now on the Navigation toggle button rather . It takes a lot of time to get back to the spot where I was reading.

Test procedure

Here's what I suggest as a test:

  1. If a person is using JAWS with Thorium reading an ePub that has pagination built in
  2. And they want to get the page number of a passage of text
  3. So they move the JAWS virtual cursor to the last word of the passage
  4. Then they press a shortcut key to get the current page
  5. When step 4 is done, Thorium should ensure the JAWS virtual cursor is at the word the user was at in step 3.

If you need help with making the task of getting the current page number more usable, I can chip in as best I can.

danielweck commented 2 years ago

Hello, thank you for your detailed description.

The key problem with screen reader - web browser interactions (remember, thorium's graphical user interface is all HTML CSS JavaScript etc.) is that the screen reader's internal state - crucially, the reading cursor - is not reflected in any Web API that an application like Thorium could leverage to track reading progress, create bookmarks, insert annotation highlights, etc. We therefore have to rely on basic keyboard tab navigation (i.e. actual focus events) as well as other pointing device events (e.g. mouse click) in order to infer the user's reading location. This set of techniques is applicable to sighted and AT users, but it is suboptimal because we are missing detailed granular interaction information. The problem is particularly noticeable with screen readers because text selections and character-level cursor are not in sync with the corresponding Web API (i.e. HTML DOM Ranges). Instead, modern screen readers operate in a world of "virtual screen buffers" (I use these terms loosely as I realise they have a precise meaning in some screen reader implementations) which is highly optimised for the task at hand, but which unfortunately leave JavaScript / HTML programmers in the dark (pun intended). The modern DOM accessibility tree exposed by webviews solved many authoring problems, but mostly to the benefit of the end user / screen reader / AT user. As reading systems implementors, we still have to do a lot of guess work in order to figure out what the screen reader is doing / where it is reading. Our tests show that there are more reading cursors visual inconsistencies in paginated content (CSS columns) than in typical scrolling web pages. So generally-speaking we recommend the latter presentation style to screen reader users. We also recommend to emulate mouse clicks on the HTML text that users wish to mark as their "current reading location". This is necessary so that Thorium can report the correct DOM element within the page spread / scrolling viewport, which in turn allows us to determine the current page number (or to be precise, the nearest preceding element authored as an EPUB page break). This also allows us to build the path of headings that lead to the current location, which in turn can be correlated with the authored EPUB table of contents. To conclude, I note your point about losing the reference in the HTML publication document, when moving in and out of some GUI control. In Thorium we solved this problem by injecting a custom hyperlink labeled "underscore" at the very beginning of rendered DOMs. This is tabbable directly from the main landmark in the application GUI. This special link points to the current reading location Thorium is aware of (note: not the screen reader's own cursor which we know little about, but the document position last accessed by the user via TOC or bookmark navigation, mouse click on document, text selection, etc.)

sinabahram commented 2 years ago

Hi all, I have a few suggestions that may help. I hope this is of assistance.

  1. Is it please possible to update the status bar, if you have the ability to do that through some API call with the current page number? I ask because screen readers have keystrokes for reading the status bar, and that won't cause focus to be lost from the virtual document.
  2. If the above is impossible, can you please just put in some functionality to announce the current page number once a keystroke is hit? Understood that this may be +- 1 page or so due to the decoupling of virtual cursor and what portion of a page may be on screen at any given time, but I've solved that problem in 3 below. The way you would do this is simply populate a live region with the page number (this is less than a handful of lines of code and given the technical sophistication of your response, i'm guessing you already know how to do this, but please ask if not).
  3. Some messaging to let SR users know to hit enter on the word they are on before reading the page number would go a long way. Note, hitting enter in the virtual cursor should issue a click event on that spot, even without the underlying thing being interactive, which means you will then have a source of truth for virtual cursor focus VS implied focus. That means the page number being reported by either solution 1 or 2 above is fully accurate. There's a more advanced thing SR users can do if that doesn't work called routing a cursor to the location and then simulating a click event, but you don't have to mess with all that if enter does the job :). I don't have strong opinions on that messaging e.g. in documentation, etc.

Thank you for all the work to make this more accessible, especially to screen reader users.

sinabahram commented 2 years ago

Ahh, and for the screen reader users, I also have a suggestion, which is to use your screen reader's capabilities to leave a virtual place marker. That way, no matter where you are in the virtual document, you can always return right there by hitting a simple key. This doesn't mean the solutions I suggested above should not be implemented, of course, just that there does exist a recovery path for loss of focus.

danielweck commented 2 years ago

Hello Sina, thank you for your feedback. 1) Thorium is a cross-platform application that doesn't have a traditional Windows "status bar" GUI landmark. Note that the "current page number" (if any) is already displayed in the "goto page" section as well as in the "where am I" popup, but there is a level of indirection to get there. 2) We can certainly add an ARIA live region (assertive) activated upon a keyboard shortcut, though I must say finding a sequence of keystrokes that works consistently in Windows, Linux, and Mac with JAWS, Narrator, NVDA, VoiceOver is not quite as easy as it seems Here is the Wiki page that documents existing keyboard shortcuts: https://github.com/edrlab/thorium-reader/wiki/Keyboard-shortcuts 3) Are you describing functionality that works with all screen readers? By the way, in previous discussions with JAWS and NVDA users in this GitHub "forum", as well as based on my own sighted experience with Voice Over, we tested this workaround. This is in fact exactly what I recommend, as mouse clicks are interpreted in Thorium as user intent to read the targeted text. Of course this is not 100% reliable, as users might subconsiously click on a paragraph (I am one of these users who like to click on things ), but then the user carries on reading further down the text and Thorium obviously is not aware of that, either because of screen reader independent virtual cursor, or because we do not implement eye gaze tracking yet

sinabahram commented 2 years ago

Some quick responses for you.

Not all screen readers, but that's ok, right? We shouldn't let perfect be the enemy of the good here and slowly work towards making sure it works for everyone.

Same thoughts on keystrokes. Then don't make them consistent. At a certain point, everyone who encounters this, at least across our dozens of clients, ends up putting in a customization dialog for keystrokes because it is a losing battle to try to find cross platform keystrokes especially when AT is thrown into the mix. Let users assign their keystrokes and so many worlds open up e.g. mapping to one-handed keyboard devices, alternative input devices, voice macros can map easier, etc.

RE your note about where page numbers are available, that's only helpful if no focus change is required of course.

Thanks so much!

spudthebud commented 2 years ago

Thanks for taking the issue so seriously. In the academic world, the user task of getting the current page number for quoting goes from 1st year undergrad papers to doctoral dissertations and into academic careers. If there isn't a truly usable solution, then a disadvantage will exist in universities and colleges. By usable, I follow the ISO definition of usability, where completing a user task is measured in terms of efficiency, effectiveness, and user satisfaction.

Sina's aria live region idea sounds interesting. It could be fast and effective. I imagine there is a way of trying it out? If I can chip in any way, I will. An academic publisher gave me an epub with pagination embedded.

UdoBavaria commented 2 years ago

Hi all,

At the present I write a JAWS script to get the page number automatically by opening the page dialog, extracting the number, closing the dialog and go back to the previous position. I think I'm on a good way but I need some more time to finish the scripts. Only want you to know this.

Udo

sinabahram commented 2 years ago

The JAWS script sounds useful, but just to say that the proper way to solve this is within the app, so it can be solved for everyone. Hoping for some actionable steps forward now that some real solutions have been identified in this thread.

spudthebud commented 2 years ago

I'm again offering to help with any sort of preliminary testing with Sina's suggestion if the dev team will give it a try. It would be a very helpful feature for Thorium readers in university settings. And thank you to Udo also. Clearly this is important.

danielweck commented 2 years ago

Hello all, thank you for taking the time to file this feature request, and for your suggestions on how to implement it.

We are going to ship an experimental proposal in Thorium 2.0, which features the new CTRL SHIFT k keyboard shortcut to force the screen reader to speak the equivalent of the information available via the popup modal dialog which opens when invoking the CTRL SHIFT i keyboard shortcut, but without the disadvantages for loosing document focus.

The information is re-ordered though, in order to first speak the current "page number", i.e. the nearest preceding authored page break in the HTML document, relative to the current reading location which is typically the parent element of selected text or the text cursor, as designated via mouse click or screen reader equivalent (this mechanism depends on NVDA, VoiceOver, JAWS, etc.). The current reading location of course also corresponds to the last linked heading from the table of contents, or opened bookmark, or targeted "page number" from the EPUB page list in the navigation panel (goto page feature). So, the spoken information starts with the page number, then progression data such as percentage within the current HTML document or audio book, and index of spine item in the reading order. Lastly, a trail of document headings is spoken (just as in the modal popup dialog opened via CTRL i (CTRL SHIFT i to force keyboard focus into the "where am I" progression data).

I hope this helps. Please let me know if this proposed user experience is moving in the right direction. Thank you.

spudthebud commented 2 years ago

Thanks for moving it forward Daniel. Sina, Udo, any thoughts? I can chip with some testing and providing feedback when 2.0 is released. Should this ticket be closed before the testing?

danielweck commented 2 years ago

Hello, the GitHub issue is closed via a code commit in the develop branch, as per our development process (the master branch will be updated and tagged with the official 2.0 release).

Feel free to continue the discussion here and we can re-open the issue if there are remaining problems. Sometimes it is preferable to open a new issue, for example to describe suggested improvements to an existing feature.

danielweck commented 2 years ago

You do not have to wait until the official 2.0 release to try out the new features. You can download Windows, Linux and MacOS installers (automated builds) from the GitHub release page: https://github.com/edrlab/thorium-reader/releases For example, the Windows app is available at the following link: https://github.com/edrlab/thorium-reader/releases/tag/latest-windows

spudthebud commented 2 years ago

I've been doing some testing of Thorium 2.0 with VoiceOver on Mac. Shift + Control + K causes a page number to be announced, along with some other information. It keeps the VoiceOver cursor in the same spot, which is great. However, the page number that is being announced doesn't quite match the printed book. It could be the ePub itself, I suppose. I'll have to do some more testing, including with JAWS.

danielweck commented 2 years ago

Hello @spudthebud remember that the screen reader's accessibility cursor is completely proprietary and disconnected from the actual keyboard focus or mouse click interaction inside the HTML document (which we rely on to know / estimate what the current reading location is, during human interaction). In other words, the screen reader may be speaking text that is out of view (relative to the visible viewport, either horizontal paginated spread or vertical scroll extent), or that is in-view but in the absence of HTML element focus/selection, Thorium estimates the reading location by looking at the topmost text in the current visible viewport (typically, that's the top-left corner of the paginated spread / scrolling page, in the case of Left-To-Right / Top-To-Bottom languages). So in order to work around this inherent limitation in current screen reader technology, NVDA / JAWS / etc. users must use specific methods to trigger some "real" text selection / focus in the HTML document (i.e. not the screen reader's proprietary internal buffer handling).

spudthebud commented 2 years ago

I don't think the page number issue is usably resolved with Thorium 2,0, and so the issue should be reopened. I think I found an obvious solution.

First, I've been discussing this Thorium and page number issue with a blind researcher in a university. This is the message being relayed to this discussion:

“For students, researchers and scholars with print disabilities, every accessibility barrier we encounter slows us down and takes time away from the most valuable tasks we should be engaged in. Having a robust feature that provides us with page numbers would not only enable us to keep up with the referencing conventions of our disciplines, but would help to make citation tasks faster and more efficient. In addition, having access to page numbers quickly and easily lets us check the citations used by others, so we can focus our energies on keeping up with scholarly sources in our fields.”

It occurred to me, maybe there is an obvious solution. Even if page numbers are in the EPUB file they are not being rendered in the content. Here's an example from an academic EPUB:

span.pagebreak-rw { width: 0; font-size: 0; line-height: 0; height: 0; visibility: hidden; float: right; }

Could Thorium have a couple of settings?

  1. Show page number with message, such as "Start page"
  2. Show page numbers with line breaks

The image below shows a page from an EPUB that has page break. It starts with: [Start Page 15]. I have verified this is correct with the physical item.

image

This second image shows some text from an EPUB and in the middle of a sentence is: [Start page 16]. I have verified this is correct with the physical item.

image

This third image shows some text from an EPUB. In the middle of a sentence "21" appears with paragraph break before it and after it for sighted scholars.

image

Button 1 would address the researcher's comment.

Button 2 would make Thorium more usable friendly for sighted researchers in universities who also need page numbers for quotation purposes.

If a reader doesn't want page numbers, then keep both buttons off.

spudthebud commented 2 years ago

@danielweck did you see my last comment about Thorium making the page numbers visible in the text if the EPUB has them?

danielweck commented 2 years ago

Hello @spudthebud sorry for the late reply. I moved your analysis and suggestions to a GitHub "discussion" so that we can flesh out the details: https://github.com/edrlab/thorium-reader/discussions/1799