ds300 / jetzt

Speed reader extension for chrome
Other
485 stars 124 forks source link

Work with ebooks #79

Open nomicode opened 10 years ago

nomicode commented 10 years ago

This for me is one of the potential killer features. I would love to be able to buy an ebook (Kindle, iBooks, ePub, whatever) and speed-read it with jetzt.

I am not even sure where to start with this. But I would be interested in researching it. I just tried jetzt with the Kindle Cloud Reader, and it doesn't work. As with #78, it might simply be that we document a way to convert your file format.

peteruithoven commented 10 years ago

This might be interesting: https://github.com/futurepress/epub.js/

nomicode commented 10 years ago

Tested jetzt on the epub.js example, and it doesn't work as is.

nelsonihc commented 10 years ago

I've been using the google books (web), they can handle epubs and jetz is working fine.

nomicode commented 10 years ago

Huh, you're right, it does. Still a bit of a problem with page turning. I reduced the text size so it was the smallest size possible, so you get the most amount of text on the page. This works okay, but you still have to exit jetzt each time, and select or turn to a new page.

I wonder if it's possible to integrate jetzt with epub.js in such a way as to make jetzt a primary interface, not some additional layer on top of the epub.js interface. Does that make sense? So that I could skip forward and backwards pages, chapters, etc. And that jetzt's progress would be indicated somehow, perhaps in the book TOC on the left hand side.

I am very interested in this. jetzt is obviously useful for skimming long blog posts and emails and what have you. But the thought of being able to inhale entire books has me very excited. And might be interested in doing the research and work to make it happen if others think so too.

nelsonihc commented 10 years ago

There's an opensource chrome extension project called readium (http://readium.github.io/). It runs offline on client side and does not depends on nodejs to provide epub parsing. Jetz won't work out of the box because one cannot run an extension inside an another extension. One idea would be forking the project and extending its function embedding Jetz on it.

nomicode commented 10 years ago

Instead of forking, could we vendor the code in as a library?

nelsonihc commented 10 years ago

I think so, the readme.md of the project stated that the ebook viewer could be served from a static webserver (the component is https://github.com/readium/readium-js-viewer) so I tried with "python -m SimpleHTTPServer" on the cloned repository and it did ok. Jetz is able to read the screen and it is possible to select a whole paragraph to read instead of single pages or selections.

Maybe a javascript savvy developer could check it quicker this possibility.

matt-gardner commented 10 years ago

I just tried using jetzt to read an ebook on google play books. It definitely does not work, at least for books that you have bought there (it seems to work alright for books that have been uploaded). Google puts in a bunch of random words to make automatic processing like this difficult. However, it looks like it is easily bypassed. You just need to get the gbt elements and filter out those that have window.getComputedStyle(element).display == 'none'. Concatenate what you have left, and that should give you the text. I imagine something similar should work for the Kindle Cloud Reader.

matt-gardner commented 10 years ago

And the beginnings of doing something similar on Kindle Cloud Reader (the line of code should be cleaned up; I'm just writing this down somewhere so I don't forget it, and for anyone else who might want to pick this up):

document.getElementsByTagName("iframe")[1].contentDocument.getElementsByTagName("iframe")[1].contentDocument

The second [1] really should be something like .style.visibility == 'visible' - when I looked at it, there were two visible iframes and two hidden, presumably showing the two current pages, and the page on either side. Then there are a bunch of divs with class name "was-a-p". Inside those are a bunch of spans that contain the text. It's not obfuscated, like Google's is, but it isn't obvious to me how to automatically figure out which parts of the text are actually displayed. Maybe you could do something with computed style again to figure it out, but I didn't get that far.

nomicode commented 10 years ago

I want this so much I would actually put a bounty on it.

Edit: specifically Kindle Cloud Reader.

matt-gardner commented 10 years ago

I think the right way to handle this is to have domain-specific DOM parsers. That way you can write something specific to handle Google Books, something for Kindle Cloud Reader, and have a default to fall back on for any page that doesn't have a parser registered. If we can get a framework like this, my previous two comments show how to do the beginnings of a parser for each of those two domains.

nomicode commented 10 years ago

@ds300 are you accepting patches while the SDK stuff is outstanding?

ds300 commented 10 years ago

I don't see why not.