clulab / pdf2txt

Convert PDF files to TXT
Apache License 2.0
31 stars 5 forks source link

Don't always take word over raw #45

Closed kwalcock closed 2 years ago

kwalcock commented 2 years ago

It is useful for case correction, but too many other things are "corrected" like quotation marks and contractions. The latter mess up whitespace as doesn't converts to doesnot.