atlanhq / camelot

Camelot: PDF Table Extraction for Humans
https://camelot-py.readthedocs.io
Other
3.65k stars 357 forks source link

ZeroDivisionError: float division by zero #326

Closed dimidd closed 5 years ago

dimidd commented 5 years ago

Hello,

After trying to extract tables using the Excalibur web interface, I got an error:

Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/excalibur/tasks.py", line 110, in extract
    t = parser.extract_tables(filepaths[p])
  File "/usr/local/lib/python3.7/site-packages/camelot/parsers/stream.py", line 426, in extract_tables
    table = self._generate_table(table_idx, cols, rows)
  File "/usr/local/lib/python3.7/site-packages/camelot/parsers/stream.py", line 376, in _generate_table
    flag_size=self.flag_size, strip_text=self.strip_text)
  File "/usr/local/lib/python3.7/site-packages/camelot/utils.py", line 560, in get_table_index
    lt_col_overlap.append(abs(left - right) / abs(c[0] - c[1]))
ZeroDivisionError: float division by zero

Which stems from this line: https://github.com/socialcopsdev/camelot/blob/a1b85d2c91418d71262eea42114be2229687efc3/camelot/utils.py#L568

I'll try to reduce the PDF to a minimal reproducible example and update this issue.

vinayak-mehta commented 5 years ago

@dimidd That's a good idea! I'll take a look when you add that example with the PDF.

vinayak-mehta commented 5 years ago

@dimidd Please reopen when you add the example.

mroyce1 commented 4 years ago

Same happens for me.

OS: Ubuntu 19.10 / Windows 10 64bit (happens on either system) camelot-py: 0.7.3 Python: 3.7.7

Call: read_pdf(filepath=path_to_pdf_file, pages="1", flavor="stream", table_regions=["234.35,48.59,435.24,560.16"])

Test Document: 2Q18-Earnings-Release.pdf

Doesn't happen with an older version (e.g. 0.3.2).

Thank you!