Better word separation - Githubissues

cameron / squirt

Speed read the web.

http://www.squirt.io

Apache License 2.0

1.22k stars 205 forks source link

Better word separation #19

Closed clarkf closed 10 years ago

clarkf commented 10 years ago

Hello there!

Very neat stuff! I noticed that squirt wasn't separating words out very intelligently, and wanted to lend a hand.

This pull does two things:

It separates words based on all whitespace characters (tab, newline). Previously, they were split by only spaces.
It splits up words with non-whitespace characters, such as punctuation. Because of the way that textContent works, adjacent words that are wrapped in tags (e.g. <b>This,</b> and this) will get returned as single words (['This,and', 'this']). This pull adds an extra space after all punctuation, which often helps with words that run-together.

Thanks!

/cc #10, #17

cameron commented 10 years ago

If i had a test suite I'd have no trouble accepting this PR as is (looks obvious enough), but as it stands I want to settle down and poke at a bit before merging. Thank you for taking the time to read and notice! :)

bcherry commented 10 years ago

@cameron this isn't really dangerous. the first regex is pretty straightforward, and the replacement is even more so :)

clarkf commented 10 years ago

@cameron Totally understood. Let me know if you'd like any assistance laying the groundwork for a test-suite (I like mocha+chai personally).

walfly commented 10 years ago

This fixes issue #30. Looks good to me, will be excited when its merged.

cameron commented 10 years ago

Thanks, @clarkf. I would adore some help with the tests. LMK if you start working on them so I don't do the same :)

cameron commented 10 years ago

FYI, Squirt got mentioned on Business Insider :)

aixnr commented 10 years ago

@cameron I'd love to try it. what do I do now? re-drag the bookmarklet from Squirt.IO ?

cameron commented 10 years ago

Clear your cache :) The change was to a script that gets injected, not the bookmarklet itself.

aixnr commented 10 years ago

@cameron tried with this article http://techcrunch.com/2014/03/11/wikipedia-co-founder-jimmy-wales-accidentally-starts-a-bitcoin-donation-campaign-for-wikipedia/ , hereby I confirmed the fix works. Nice job there @clarkf and @cameron !!

clarkf commented 10 years ago

@cameron I'll definitely give it a shot! I'll open a pull shortly to drop commits.