Closed valearna closed 3 years ago
The pdf2txt conversion library adds the default utf-8 unknown character for characters it fails to convert. I replaced all the default utf chars from the resulting text with white spaces to make the extracted text more readable. I also replaced newlines with white spaces in the matched sentences.
Converted text displayed in the results email contains some weird characters. Maybe this is related to the email encoding?