atlanhq / camelot

Camelot: PDF Table Extraction for Humans
https://camelot-py.readthedocs.io
Other
3.65k stars 356 forks source link

Not reading all tables and also reading plots as tables #394

Closed gudlur46 closed 4 years ago

gudlur46 commented 4 years ago

Hello Mr. Mehta, I am new to python and migrating from MATLAB. I wanted to convert a bunch of pdf files which has plots and tables in it into .csv. But issue when i want it to read the file reads plots as tables and later by using flavor = 'stream' got it to read tables but only reads one table. My file has two tables and two plots. To begin with my code is not even recognizing both the tables. Tablelist says n = 1. Pasting my code below. Can't find a place to attach my file for more info. Let me know if i could do that. In the Below code when used camelot.plot(tables[0], kind='text') shows the file in the console but only reads one table. Any help is appreciated. Thanks ``import camelot

import pandas as pd

tables = camelot.read_pdf("2019_11_20_9_31_40-S550-012-00000000XC253927_Torsional RPM Ramp RMS.pdf",flavor = 'stream') tables tables[0].df camelot.plot(tables[0], kind='text') camelot.plot(tables[0], kind='grid') camelot.plot(tables[0], kind='contour')

(image) image

anakin87 commented 4 years ago

I confirm that often Camelot recognizes plots as tables.

anakin87 commented 4 years ago

If you can, attach your PDF file.

gudlur46 commented 4 years ago

Hello, Sorry for the delayed reply. But I have had camelot read my tables now and recognizes all the tables. I have a new issue which may not deal with camelot but part of the same program. If someone could help that would be Great. As mentioned earlier my pdf have two tables and two plots. I want my tables to be written out to same excel files but different sheets. I tried a couple of but what it is doing is it is replacing the sheet 1 with sheet 2 and at the end file just has one of the table of the two. Below is attached code. ########################################################################## import camelot import pandas as pd tables = camelot.read_pdf("2019_11_20_9_31_40-S550-012-00000000XC253927_Torsional RPM Ramp RMS.pdf",flavor="lattice") camelot.plot(tables[1],kind='line') tables tables[0].df tables[2].df pd.ExcelWriter('P_S.xlsx', engine='xlsxwriter') tables[0].df.to_excel("P_S.xlsx",sheet_name ="Sheet1") tables[2].df.to_excel("P_S.xlsx",sheet_name = "Sheet2")

Thanks

gudlur46 commented 4 years ago

Fixed my issue it was my code that was problematic. Still Camelot doesn't read headers and hard Coded them in my code before it writes it out to the excel file. Thanks everyone for the help. Also pasting the final version of the code which was extracting all the info i needed. Thanks Camelot Cheers!!!!!!!! ########################################################################### import camelot import pandas as pd tables = camelot.read_pdf("2019_11_20_9_31_40-S550-012-00000000XC253927_Torsional RPM Ramp RMS.pdf",flavor="lattice")

camelot.plot(tables[1],kind='line')

tables tables[0].df tables[2].df writer = pd.ExcelWriter('P_S_1.xlsx', engine='xlsxwriter') tables[0].df.to_excel(writer,sheet_name ="Sheet1") tables[2].df.to_excel(writer,sheet_name = "Sheet2") writer.save() #########################################################################