Added some domain-specific DOM parsing in the selection step

matt-gardner commented 10 years ago

A start to fixing issue #79. This introduces a DOM-modification step in between selection and parsing, so that if there are any changes that need to be made to bypass obfuscation, they can happen here. This works for me for Google Play Books (only needed if the book has been purchased - uploaded books don't have the obfuscation). A similar approach should also work for the Kindle Cloud Reader, though I don't have books there that I want to read, so I probably won't work on that part.

matt-gardner commented 10 years ago

I tried to send this request to dev, not to master... I've never actually done this on github before. So if I need to change something to get this to the right place, let me know.

ds300 commented 10 years ago

On first glance, this seems to be unnecessary in light of the new parsing stuff, which uses the Selection api to get raw text from the dom. The only change I had to make to get it to work was re-enabling the user-select style property on dom nodes (which I wasn't previously aware could be an issue, so thanks for bringing that to my attention). It successfully ignores the junk text nodes you get with purchased books.

demo:

var all = document.querySelectorAll(".gb-segment, .gb-segment *");

for (var i=0;i<all.length; i++) {
  var elem = all[i];
  elem.style && (elem.style.webkitUserSelect="all");
}

var segments = document.querySelectorAll(".gb-segment");

for (var i=0; i<segments.length; i++) {
  var segment = segments[i];
  var range = document.createRange()
  range.selectNodeContents(segment);
  var sel = window.getSelection();
  sel.removeAllRanges();
  sel.addRange(range);
  console.log(sel.toString());
}

Other problems with google play include mid-sentence page breaks (this will add unwanted pauses) and having to manually turn the page quite frequently. I think these can be solved nicely.

matt-gardner commented 10 years ago

Is the coffee branch running with the new parsing code? It looks like manifest.json still uses the old javascript. Is there some documentation on what I need to do to get the coffee branch running?

ds300 commented 10 years ago

Not yet. The new parsing stuff has some ramifications on the design of the instruction executor which I haven't fully resolved (it's not even compiling at the moment).

matt-gardner commented 10 years ago

And this evening it looks like Google pushed a change to their play books app that puts the content behind an iframe from another domain, even for books you uploaded. So, this no longer works. The timing seems a little too coincidental to me...

ds300 / jetzt

Added some domain-specific DOM parsing in the selection step #131