atlanhq / camelot

Camelot: PDF Table Extraction for Humans
https://camelot-py.readthedocs.io
Other
3.64k stars 354 forks source link

Performance improvement for fixed table Pdfs #301

Closed rannj005 closed 5 years ago

rannj005 commented 5 years ago

I am trying to extract data using lattice flavor for pdfs. The table data is fixed it has fixed boundary as well as fixed column width for each column. Have tried specifying the table_areas parameter as well. Is there a way to improve the performance(Time Taken) of extraction given that table boundaries and columns are fixed? Have attached sample pdf. Please suggest a way out. sample.pdf

anakin87 commented 5 years ago

What do you mean for performance? Time, quality of results...

rannj005 commented 5 years ago

The time taken is more. Quality is perfect.

rannj005 commented 5 years ago

Please mention the ways to improve the speed of pdf table extraction.

vinayak-mehta commented 5 years ago

@rannj005 Let me look into this.

vinayak-mehta commented 5 years ago

https://github.com/camelot-dev/camelot/issues/20 could improve performance.