Open olivierbouman opened 2 years ago
Can you check the properties of the PDF? It could be secured for extraction.
Did you solve the problem? If yes, give me the solution please, I had the same problem
In case someone is having this error, I fixed this by changing the line in utils.py from:
if (bbox_intersection_area(ba, bb) / bbox_area(ba)) > 0.8:
to:
if bbox_intersection_area(ba, bb) > bbox_area(ba)*0.8:
@Fadheler Yep, this does solve the issue, but some parts of the pdf do not get recognised (in my case, the top row in the tables was empty)
This probably has something to do with tweaking the tolerance parameters row_tol
and column_tol
@olivierbouman
In case someone is having this error, I fixed this by changing the line in utils.py from:
if (bbox_intersection_area(ba, bb) / bbox_area(ba)) > 0.8:
to:if bbox_intersection_area(ba, bb) > bbox_area(ba)*0.8:
As this is a change in the package, we should not change the library code directly. to tackle it, we have changed and created .whl file that can be used as a package till this issue is fixed in the Package itself
Please find the attachment for references https://drive.google.com/file/d/1COKC7s9uez8neZgrgaOjbUcLy56Ib7QO/view?usp=sharing
Hello all,
I am trying to extract some tables from a pdf using camelot-py
Version: 0.10.1
with the following setupBasically all of the tables are parsed very well, but one table throws the following error:
Looking at camelot/utils.py it seems that
camelot
encountered/created a one-dimensional table TextLine element:<LTTextLineHorizontal 65.883,392.092,65.883,404.102 'ً\n'>
This value represents
ba
in the code fromutils.py
line 379-381:The area of
ba
is zero, thus the division by zero error occurs.Is this by any chance a problem anyone else encountered before? And if so, any possible solutions?
It also seems that this could possibly be catched by checking for a zero size area, or was this left out of the code on purpose?:)
Many thanks in advance!