Open abhishek-jain-infrrd opened 6 years ago
I had the similar issue for some of the pdf while parsing. (cid:160) and (cid:173) was in the places of spaces between the texts I have fixed this error by adding - ('space', None, 202, 160, None), ('space', None, 202, 173, None), to the latin_enc.py file.
Hope it helps
I am facing the issue where when using pdfminer to get the text out of pdf, I am getting each character as CID encoded for the pdf. But if I open up the pdf and select the text then I can copy it and use it.
Attaching the sample pdf. sample.pdf