Closed cqluohong closed 5 years ago
Please attach the PDF...
Please attach the PDF...
430027-北科光大-2017年年度报告.pdf in this pdf with page 55
there is another question,how to handle pdfminer.psparser.PSSyntaxError,I watched #161 ,Need to be repaired by mutool,but camelot use pdfminder as same as pdfplumber ,pdfplumber worked
To extract the tables from the file you provided, you have to set parameter line_scale=80
.
See https://camelot-py.readthedocs.io/en/master/user/advanced.html#detect-short-lines
To extract the tables from the file you provided, you have to set parameter
line_scale=80
.See https://camelot-py.readthedocs.io/en/master/user/advanced.html#detect-short-lines
thank you,You helped me a lot.
Looks like that solved the issue, closing this. Thanks @anakin87.
To extract the tables from the file you provided, you have to set parameter
line_scale=80
.See https://camelot-py.readthedocs.io/en/master/user/advanced.html#detect-short-lines
Hi @anakin87 , I, too, faced a similar issue and your solution helped. Thanks. However, it'd be great if you helped me understand the behavior of _linescale parameter. As I noticed, the thickness of border lines of different tables in source PDF is the same. Then why is it so that camelot is able to identify certain tables and not able to identify other ones (especially those with fewer than 4 rows) ?? Thanks in advance. :)
I have some pdf where are two tables in one page ,but I can not extract the small one,Can adjust the extraction accuracy to ensure that small tables are not discarded