atlanhq / camelot

Camelot: PDF Table Extraction for Humans
https://camelot-py.readthedocs.io
Other
3.61k stars 349 forks source link

camelot 'stream' floavor isn't able to extract more than 16 digit integer. #441

Closed rustydigg918 closed 3 years ago

rustydigg918 commented 3 years ago

Code used: pdf_data = cm.read_pdf(i, flavor='stream' , pages = '1',edge_tol=500)

PDF DATA: 41000699230001399 Camelot Extracts: 41000699230001300

The error is coming in the last two digits only, and it's not because of 9, it is happening with every other digit and the engine is returning 0 at the place of those two last digits

What should I do?

rustydigg918 commented 3 years ago

After 3 hours of wasting my time, I've managed to resolve the issue. And to all those who encounter this issue in the future, don't choose CSV format to write your data as it might not write integer data which is more than 4-bit correctly. fwf or excel might be a good choice.