camelot-dev / excalibur

A web interface to extract tabular data from PDFs
https://excalibur-py.readthedocs.io
MIT License
1.59k stars 231 forks source link

Unable to extract full table from PDF #150

Open keerthip1121 opened 2 years ago

keerthip1121 commented 2 years ago

I was trying to extract table and convert it to excel from a PDF file. But full table is not extracted when using the flavor 'stream'. The full PDF table was divided into 2 table dfs(which I concated later, no problem with that) but some part of table data is not extracted. With flavor 'lattice' full table data is extracted but format is preferable with 'stream'. Can u please help to extract full table data with 'stream' itself. In the submitted excel, sheet1 is data with flavor 'stream' and sheet2 with 'lattice'
pdf pdf-excel226-11.xlsx .