Strange dash behavior - Githubissues

ds300 / jetzt

Speed reader extension for chrome

Other

485 stars 124 forks source link

Strange dash behavior #149

Closed flowchartsman closed 8 years ago

flowchartsman commented 8 years ago

When reading this article I noticed there were dashes splitting words where they didn't appear in the document. "Under-graduate", for example. Wondering where this comes from, and whether or not something can be done to avoid it.

flowchartsman commented 8 years ago

Is this simply related to the width of the word?

ds300 commented 8 years ago

Yup. Words longer than 12 characters are broken up, as they are deemed to be too long to read accurately at speed.

flowchartsman commented 8 years ago

Have you considered using a hyphenization algorithm? Would you accept a patch for it?

ds300 commented 8 years ago

Sure, that'd be awesome. There was talk about using TeX's algorithm in #117 but nothing ever came of it. It would have to fallback gracefully if the language of the text doesn't fit the model, and if it ends up being a ton of code I probably won't put it in the bookmarklet, but otherwise go ahead. :+1:

flowchartsman commented 8 years ago

Was gonna shoehorn this guy in. Seems fairly compact, and would only work on English initially: https://github.com/cuzzo/Hyphenator

ds300 commented 8 years ago

sounds good, just couple it with some sensible english detection or make it an off-by-default option and I'll happily merge it in.

flowchartsman commented 8 years ago

Is there a repo with the un-minimized code somewhere?

ds300 commented 8 years ago

yeah the source files are in /modules.