Open alxhoff opened 4 years ago
Thanks for the hint. I am aware of that and I think it is solvable by checking against a spell checker. Otherwise it is not possible to tell if hyphens are intra or inter words. E.g. "high- end" vs. "high- lighting".
If it is your own paper, the best solution is probably to run the script on the .tex
file.
I added a first mechanism to resolve the hyphenation issue in d83999311aa838525993de6444e5a9805b9c3dc2. So far the script looks for words at the end of a line containing the suffixes "based", "case", or "level", which indicate a potential error from the pdf2text tool but it is not perfect yet.
hyphenated words that just happen to fall at the end of a line are reconstructed without the hyphen.
In my paper I have this example. `...but with very contrasting power- performance thread....."
This becomes "but with very contrasting powerperformance thread"
after pdf2text. No idea if it's solvable but thought I'd let you know.