bramstein / typeset

TeX line breaking algorithm in JavaScript
BSD 2-Clause "Simplified" License
984 stars 75 forks source link

Incorrect flagged demerits value #27

Open baskerville opened 5 years ago

baskerville commented 5 years ago

In the Knuth-Plass article, the parameters used for typesetting Seminumerical Algorithms are given:

the consecutive-hyphens and adjacent-incompatibility demerits were α = γ = 3000

The default value used in this library, 100, isn't mentioned in the article.

PhilterPaper commented 3 years ago

Was that value of 3000 specifically tuned to the needs of Seminumerical Algorithms, and 100 is a better general purpose value? Perhaps 100 wasn't mentioned because it is something that would be different on many systems.

Being able to tune this parameter might help with bad line breaking (or Breaking Bad lines!) I'm seeing with the Perl implementation PhilterPaper/Text-KnuthPlass.

PhilterPaper commented 1 year ago

I see, after having taken a dive into the typeset code, that the hyphenation penalty/demerit defaults to 100, and adjacent-incompatibility (excessive fitness change line-to-line) demerit to 3000. I didn't see anything about the two normally being the same value (3000), but then I should probably read Knuth's book of annotated TeX code and see if I can find anything. I would think that you'd normally want to penalize line-ending hyphens a lot less than great changes in line tightness, so maybe that's a typo in the article, or something that was later changed (ease the hyphenation penalty)?

shreevatsa commented 1 year ago

@PhilterPaper Plain TeX sets \hyphenpenalty=50 and \adjdemerits=10000 (note that demerits are calculated using the square of the penalty). So "visually incompatible" consecutive lines are penalized four times as much as hyphens. (You'll find this in The TeXbook, as these parameters are set by the macro package / plain.tex, and not in the source code of TeX itself.)

PhilterPaper commented 1 year ago

In typeset (also Text::KnuthPlass Perl port), there are three demerit values used: demerits.fitness (default 3000), demerits.flagged (default 100), and demerits.line (default 10). Which of these is α, β, and γ in the original question? From the original question, is "flagged" α? demerits.line is used in calculating fitness when ratio is between -1 and tolerance. demerits.flagged seems to be associated with hyphenated words at line-end. demerits.fitness is added in if the tightness changes by more than one step (class). Do these uses seem to be in accordance with TeX?

In the Text::KnuthPlass tolerance defaults to 30, which seems way too high. Maybe I'll reset it to 2, which is the default in typeset. Otherwise, I think the code is using these settings in the same manner as typeset, although I'm not absolutely sure if the hyphenation penalty is 50 or 100 (I need to dig deeper). The Big Question is then: was typeset written correctly (algorithm matching TeX) in the first place? And do the default settings for demerits, etc. match up with recommended TeX usage?