camelot-dev / camelot

A Python library to extract tabular data from PDFs
https://camelot-py.readthedocs.io
MIT License
2.96k stars 466 forks source link

IndexError: list index out of range: if t.x0 > cols[-1][1] or t.x1 < cols[0][0] #390

Open zakdances opened 1 year ago

zakdances commented 1 year ago

Describe the bug

Traceback (most recent call last):
  File "/Users/me/myproj/extract.py", line 119, in <module>
    main()
  File "/Users/me/myproj/extract.py", line 85, in main
    extract_that_file(input_file_filepath, input_file_output_camelot_dir)
  File "/Users/me/myproj/extract.py", line 35, in extract_that_file
    tables = camelot.read_pdf(str(filepath.resolve()), pages='1-end', flavor='stream')
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/lib/python3.11/site-packages/camelot/io.py", line 113, in read_pdf
    tables = p.parse(
             ^^^^^^^^
  File "/opt/homebrew/lib/python3.11/site-packages/camelot/handlers.py", line 173, in parse
    t = parser.extract_tables(
        ^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/lib/python3.11/site-packages/camelot/parsers/stream.py", line 463, in extract_tables
    cols, rows = self._generate_columns_and_rows(table_idx, tk)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/lib/python3.11/site-packages/camelot/parsers/stream.py", line 382, in _generate_columns_and_rows
    outer_text = [
                 ^
  File "/opt/homebrew/lib/python3.11/site-packages/camelot/parsers/stream.py", line 386, in <listcomp>
    if t.x0 > cols[-1][1] or t.x1 < cols[0][0]
              ~~~~^^^^
IndexError: list index out of range

Steps to reproduce the bug

conda install -c conda-forge camelot-py pip install "camelot-py[base]" (also installed after clone to get the current version (0.11.0))

import camelot

tables = camelot.read_pdf(my_filepath, pages='1-end', flavor='stream')

PDF

https://www.hcd.ca.gov/housing-elements/docs/Lafayette-6th-Adopted-013123.pdf https://www.hcd.ca.gov/housing-elements/docs/Concord-6th-Adopted-032123.pdf https://www.hcd.ca.gov/housing-elements/docs/Orinda-6th-Draft-111622.pdf

Environment

mike-gigs commented 5 months ago

Sorry to reopen this item, but were you ever able to resolve this problem? I am facing the same error with a set of PDFs and have not seen any other discussions regarding this issue.