Open ayushnarsaria opened 4 years ago
Hi, have you tried using stream flavor?
page_4 = camelot.read_pdf(jcc.26188.pdf, flavor='stream', pages='4') page_4 gives two tables, the table you want to extract is the first table here.
array([['T A B L E 1', '', '', '', '', '', ''], ['', 'Statistics and error analysis of TDDFT functionals compared to experimental λmax values for the lowest dipole-allowed vertical', '', '', '', '', ''], ['excitation energy (Evert-abso(DCM),', 'in eV)', 'in dichloromethane calculated using the COSMO solvation modela', '', '', '', ''], ['', 'GGA', '', 'GH', '', 'RSH', ''], ['Statistical parameters', 'OLYP', 'BLYP', 'B3LYP', 'PBE0', 'LCY-BLYP', 'CAMY-B3LYP'], ['R2', '0.86', '0.86', '0.90', '0.92', '0.98', '0.96'], ['MD', '−0.39', '−0.42', '−0.09', '0.00', '0.52', '0.16'], ['MAD', '0.39', '0.42', '0.09', '0.07', '0.52', '0.16'], ['MAX(+)b (eV)', '-', '-', '0.03', '0.13', '0.61', '0.24'], ['MAX(−)b (eV)', '−0.72', '−0.74', '−0.36', '−0.25', '-', '−0.01']], dtype=object)
I understand this can be improved with using config parameters, I hope you get time to try it, I'll try to update it when I get time to look into this further.
Here is the parsing report: {'accuracy': 96.53, 'whitespace': 28.57, 'order': 1, 'page': 4}
Hello,
I am using the python camelot. The package is unable to find tables in any pdfs. It always shows me, even though the pdfs have table embedded in them.
PFA the example pdf. I hope this problem can be resolved soon.
Thank you
Best, Ayush jcc.26188.pdf