emareg / paper-checker

Find simple grammar mistakes in scientific documents.
17 stars 9 forks source link

hyphenated wrapped words not resolving #16

Open alxhoff opened 4 years ago

alxhoff commented 4 years ago

hyphenated words that just happen to fall at the end of a line are reconstructed without the hyphen.

In my paper I have this example. `...but with very contrasting power- performance thread....."

This becomes "but with very contrasting powerperformance thread"

after pdf2text. No idea if it's solvable but thought I'd let you know.

emareg commented 4 years ago

Thanks for the hint. I am aware of that and I think it is solvable by checking against a spell checker. Otherwise it is not possible to tell if hyphens are intra or inter words. E.g. "high- end" vs. "high- lighting". If it is your own paper, the best solution is probably to run the script on the .tex file.

emareg commented 4 years ago

I added a first mechanism to resolve the hyphenation issue in d83999311aa838525993de6444e5a9805b9c3dc2. So far the script looks for words at the end of a line containing the suffixes "based", "case", or "level", which indicate a potential error from the pdf2text tool but it is not perfect yet.