camelot-dev / camelot

A Python library to extract tabular data from PDFs
https://camelot-py.readthedocs.io
MIT License
2.76k stars 446 forks source link

IndexError in lattice #493

Open BoBoBrccc opened 3 months ago

BoBoBrccc commented 3 months ago

Describe the bug

An IndexError is raised in _reduce_index method in lattice.py It happens when a text is starting within the table, but finishing outside.

Steps to reproduce the bug read_pdf of attached file

Expected behavior

No error !

Code

import camelot

camelot.read_pdf(myfile, flavor='lattice', split_text=True)

PDF camelot.pdf

Environment

Additional context

N/A

bosd commented 3 months ago

Hey!

As https://github.com/camelot-dev/camelot/issues/343 this repo is no longer maintained, we try to build a maintained fork at pypdf_table_extraction.

Do you want to check out the cod ethere to see if the issue still persists. If so please open an issue there.