DistributedProofreaders / ppwb

Post Processor's Workbench
GNU General Public License v3.0
5 stars 6 forks source link

"Suppress pipe characters" option for ppcomp #35

Open ghost opened 1 year ago

ghost commented 1 year ago

It's common DP practice to represent table borders with | in the text version of a book. In HTML, borders can be created with CSS, so the pipe characters are usually removed. When comparing the two versions with ppcomp, this leads to sometimes hundreds of lines of diffs to look through, and finding any genuine diffs that need fixing is needle-in-haystack level of difficulty.

The example that broke me today: https://www.pgdp.net/d/ppwb/r63aebbc4c881a/result.html

There are at least two genuine diffs in that report. Good luck. ;)

So: I'd really like to be able to run ppcomp with an option to ignore the pipe character, to make the output on table-heavy books like this one much shorter and much easier to review.

(Bonus points for also excluding sequences that look like --+----+-- or ==+====+==, but those occur much less frequently and aren't so much trouble to scroll past.)