gethryn / hyphens

Fixing hyphens and dashes in text
0 stars 1 forks source link

Happy with the many non emdashable words you have identified #11

Closed GregTheGrate closed 3 years ago

GregTheGrate commented 3 years ago

However, there are some that are misidentified. accident-someone day-very destination-Rockaway Europe-probably Fraternity-Cadmus => don't know how we could differentiate this one Hooper-Morton idea-cater missing-Morgan => perhaps if previous word starts with small letter and hyphenated word starts with Caps then emdash nearby-name ours-Number => see above Rockaways-one side-empty Todd-Quill Todd-Temple way-probably You-You => refer also to stutter issue

gethryn commented 3 years ago

Can't think of a way to find these programatically. Perhaps a list of the remaining hyphenated words after processing is done?

accident-someone day-very idea-cater nearby-name way-probably side-empty

These could be implemented with (\b[A-Z]\w+\b-\b[a-z]\w+\b|\b[a-z]\w+\b-\b[A-Z]\w+\b) or similar.

missing-Morgan => perhaps if previous word starts with small letter and hyphenated word starts with Caps then emdash destination-Rockaway Europe-probably ours-Number => see above Rockaways-one

These would often be legit hyphens... don't think it's worth changing them.

Fraternity-Cadmus => don't know how we could differentiate this one Hooper-Morton Todd-Quill Todd-Temple

Already implemented #10.

You-You => refer also to stutter issue

GregTheGrate commented 3 years ago

I wouldn't do anything with them. "probably" and "ours" I fixed by adding to the list. If these are all I have to cope with it's fine

I only add words that I think will be fairly common , so wouldn't add "Hooper" etc.