atlanhq / camelot

Camelot: PDF Table Extraction for Humans
https://camelot-py.readthedocs.io
Other
3.64k stars 354 forks source link

feature request: change output file names rules #295

Closed aborruso closed 5 years ago

aborruso commented 5 years ago

Hi, when I have a PDF with more than 10 pages, I have this kind of output:

output_1.csv
output_2.csv
output_3.csv
output_4.csv
output_5.csv
output_6.csv
output_7.csv
output_8.csv
output_9.csv
output_10.csv
output_11.csv
output_12.csv
output_13.csv

Then if I want to merge these in order, usually programs will use this order, and I will have a wrong output.

output_10.csv
output_11.csv
output_12.csv
output_13.csv
output_1.csv
output_2.csv
output_3.csv
output_4.csv
output_5.csv
output_6.csv
output_7.csv
output_8.csv
output_9.csv

I think that (I'm not able to write the code) it would be great to have a camelot internal rule, that if the pages <10, the output start from 1, if the pages are between 10 and 99 the output start from 01, if the pages are between 100 and 999 the output start from 001, and so on.

Thank you

vinayak-mehta commented 5 years ago

Closing for now.

aborruso commented 5 years ago

@vinayak-mehta but what do you think about?

vinayak-mehta commented 5 years ago

@aborruso I believe you're talking about how the files are being saved onto your filesystem, right? Camelot saves them sequentially, but when you do an ls, the output you see is what you posted. If you need to process them in order, you have to sort them using a custom key function inside the sorted function of the stdlib. For example: sorted(glob.glob('*.csv', key=lambda x: int(x.split('_')[1].replace('.csv', ''))).

aborruso commented 5 years ago

Hi @vaibhavmule no, because your reply seems to me python related.

You have built this great tool (thank you thank you thank you), and there is also the command line version. My feature request is related to the default output of the cli, that in some way is not immediately ready, because the first thing is concatenate in order, and the order for more than 9 output files is incorrect.

vaibhavmule commented 5 years ago

@aborruso You meant to say @vinayak-mehta!