atlanhq / camelot

Camelot: PDF Table Extraction for Humans
https://camelot-py.readthedocs.io
Other
3.65k stars 355 forks source link

Found bug in codebase? #462

Open amine-aboufirass opened 3 years ago

amine-aboufirass commented 3 years ago

I think I might have found a bug in the codebase. I am using version 0.9.0 of camelot-py and the following line uses a strange order for the bounding box definition:

https://github.com/atlanhq/camelot/blob/cd8ac7979fe3631866fe439f07e9d6aaa5b1e5c6/camelot/parsers/stream.py#L305

The docstring in text_in_bbox specifies that a Tuple (x1, y1, x2, y2) should be passed. (x1, y1) is the lower left corner of the box and x2, y2 is the upper right corner.

Could somebody please check whether this is in fact a bug? I believe it might be causing the ZeroDivisionError problem when using table_regions, since the text_in_bbox fails to return any text...

amine-aboufirass commented 3 years ago

Edit

I just found that someone has already identified this in the pull request linked above. Please take a look at my comment on there....