WormBase / variant-first-pass

Variant First Pass Pipeline
0 stars 0 forks source link

Weird characters in the results email #6

Closed valearna closed 3 years ago

valearna commented 3 years ago

Converted text displayed in the results email contains some weird characters. Maybe this is related to the email encoding?

valearna commented 3 years ago

The pdf2txt conversion library adds the default utf-8 unknown character for characters it fails to convert. I replaced all the default utf chars from the resulting text with white spaces to make the extracted text more readable. I also replaced newlines with white spaces in the matched sentences.