atlanhq / camelot

Camelot: PDF Table Extraction for Humans
https://camelot-py.readthedocs.io
Other
3.64k stars 354 forks source link

Camelot doubles some characters after extraction #272

Closed asmiy closed 5 years ago

asmiy commented 5 years ago

I tried to extract the tables in this pdf 409.pdf specifying table_areas and using Stream as flavor. The result shows that Camelot doubles some characters. Can you help ?

anakin87 commented 5 years ago

Please show the output...

asmiy commented 5 years ago

@anakin87 here's the output : 293.1.xlsx, I noted that it doubles some bold characters.

vinayak-mehta commented 5 years ago

It's possible that same characters may be positioned at a small offset to simulate a bold character. Which means pdfminer will get both characters out leading to doubling. This looks similar to #103.

vinayak-mehta commented 5 years ago

Closing as duplicate of #103.