ds300 / jetzt

Speed reader extension for chrome
Other
486 stars 124 forks source link

Quotes #45

Open MB6 opened 10 years ago

MB6 commented 10 years ago

I would just like to start out by saying that Jetzt is the best spritz clone out there, openspritz is borked at the moment, and even when it worked, it wasn't as clean as this.

that said, there is one area I would like to see fixed. Quotes. I read fanfiction, and sometimes there ends up being quotes inside guotes, even when I know they arent there. (see picture) He is a sample page for you to see the problem for yourselves. Thanks alot.

I would fix it, but I don't know javascript. Again, great stuff, thanks for making this.

screen shot 2014-03-12 at 3 10 04 pm

ecsplendid commented 10 years ago

Can you test my version and see if that's any better -- https://github.com/ecsplendid/jetzt

Try pressing alt-a on that article

ds300 commented 10 years ago

This is a problem with the current DOM parsing function; if I select the text manually (and therefore bypass the DOM parser) there are no nested quotes.

Have no fear, @MB6. A brand new wonderful DOM parsing function is being written as we speak! :)

MB6 commented 10 years ago

@ecsplendid, some feedback (I don't mean any of this in a mean way, I like it very much)

  1. alt - a does nothing for me, that may be because I'm on a mac and have to go control - option - a to get alt - a
  2. I haven't seen any nested quotes in your version. I think you've fixed it! :)
  3. buuut, your single quotes are broken, the text reads: "from everything we've seen ,..." and I get: screen shot 2014-03-12 at 3 59 25 pm

The text reads: 'wards' around the and I get: screen shot 2014-03-12 at 4 03 01 pm

  1. Is something with your pivots different from the original?
  2. I'm not sure I like the font (but its growing on me!)
  3. The punctuation highlighting is growing on me too all over, good work

@ds300 I am awaiting the new fancy DOM parser!

ecsplendid commented 10 years ago

Ahh we've I didn't have a regex rule for that. I've updated for that

The current pattern is

/['’](?=(([dst]|nt|es|re|ve)[\s$\n])|cause|em|cept|tis|\d{2})/gi
/s['’](?=\s|\n)/gi

Which basically means any quotes that are followed by a d,s,t, "nt", "es", "re", "ve" then a space or followed by cause,em, cept or tis or two digits are not a part of a quotation.

I think I might give up on the single quote parsing though, it's a nightmare and will not work in all situations without some "moon-shot" investment of time!

I really appreciate you taking a look though, I hope I have explored some concepts. I feel drained now and will just wait to see what @ds300 comes up with next and reconcile with that.

"Is something with your pivots different from the original?"

Yes when I call the get pivot function, I strip out any punctuation or anything that isn't a letter basically, so ".", "-", ";" etc will come out. This may slightly shift the balance to the left!