From the definition of the text_in_bbox function, it is expected to receive the parameters in the (x1, y1, x2, y2) order:
def text_in_bbox(bbox, text):
"""Returns all text objects present inside a bounding box.
Parameters
----------
bbox : tuple
Tuple (x1, y1, x2, y2) representing a bounding box where
(x1, y1) -> lb and (x2, y2) -> rt in the PDF coordinate
space.
text : List of PDFMiner text objects.
Returns
-------
t_bbox : list
List of PDFMiner text objects that lie inside table.
"""
lb = (bbox[0], bbox[1])
rt = (bbox[2], bbox[3])
t_bbox = [
t
for t in text
if lb[0] - 2 <= (t.x0 + t.x1) / 2.0 <= rt[0] + 2
and lb[1] - 2 <= (t.y0 + t.y1) / 2.0 <= rt[1] + 2
]
return t_bbox
However, in the call to this function on line 305 in the stream.py module, this order is mixed up:
region_text = text_in_bbox((x1, y2, x2, y1), self.horizontal_text)
This commit reorders them. This may be a problem on line 317.
Apologies for any faux-pas, this is my first ever contribution to a project!
From the definition of the text_in_bbox function, it is expected to receive the parameters in the (x1, y1, x2, y2) order:
However, in the call to this function on line 305 in the stream.py module, this order is mixed up:
region_text = text_in_bbox((x1, y2, x2, y1), self.horizontal_text)
This commit reorders them. This may be a problem on line 317.Apologies for any faux-pas, this is my first ever contribution to a project!