camelot-dev / camelot

A Python library to extract tabular data from PDFs
https://camelot-py.readthedocs.io
MIT License
2.95k stars 465 forks source link

Tables ignored in lattice mode #494

Closed cave-iar closed 6 months ago

cave-iar commented 6 months ago

Describe the bug

Given a PDF page with 12 lattice tables. camelot.read_pdf(pdf_path, flavor='lattice') detects only one (returns TableList: 1 object)

Steps to reproduce the bug

import camelot
pdf_path = 'page.pdf'
tables = camelot.read_pdf(pdf_path, flavor='lattice') 

PDF

selected_page.pdf

Environment

cave-iar commented 6 months ago

Actually I was able to solve it by introducing a larger line_scale factor = 40 (larger factor helps to detect narrower lines). Hope it helps somebody