In function, find_contours, located in file, image_processing.py, there are the following two lines:
# sort in reverse based on contour area and use first 10 contours
contours = sorted(contours, key=cv2.contourArea, reverse=True)[:10]
This drops any tables past a count of 10 per page.
It may seem reasonable that there would be less than 10 tables for a page.
A simple example of a pdf that may contain more than 10 tables for a page would be a work schedule where there is a box around some small set of scheduled people, say for a given department. There may be several, more than 10, departments listed on the page.
This should not be hardcoded numeric value, but a settable parameter.
In function,
find_contours
, located in file,image_processing.py
, there are the following two lines:This drops any tables past a count of 10 per page.
It may seem reasonable that there would be less than 10 tables for a page.
A simple example of a pdf that may contain more than 10 tables for a page would be a work schedule where there is a box around some small set of scheduled people, say for a given department. There may be several, more than 10, departments listed on the page.
This should not be hardcoded numeric value, but a settable parameter.
Regards,
..Otto