documentcloud / docsplit

Break Apart Documents into Images, Text, Pages and PDFs
http://documentcloud.github.com/docsplit/
Other
832 stars 214 forks source link

Use leading zeros for appended page numbers in extract_images #83

Closed willmcclellan closed 10 years ago

willmcclellan commented 11 years ago

When extracting pages as images from a document (extract_images) it would be useful if the appended page numbers had leading zeros so they are ordered automatically in the output directory.

If you have more than 9 pages ordering gets lost:

filename_1 filename_10 filename_11 filename_2 etc…

but with leading zeros:

filename_01 filename_02 …

Just a small thing that would help with working with the files after extraction

lukaszwnek commented 10 years ago

agreed, this one would be extremely helpful

knowtheory commented 10 years ago

Nope. sorry guys.

Padding with zeros makes programmatically accessing pages problematic. I shouldn't have to know how many total pages are in a document to know where the first page lives.

If you're concerned about the human aesthetics of using ls or what not, linuxes have a -v option:

~/test$ ls -l
total 0
-rw-rw-r-- 1 ubuntu ubuntu 0 Apr  5 16:48 test_10.tmp
-rw-rw-r-- 1 ubuntu ubuntu 0 Apr  5 16:48 test_11.tmp
-rw-rw-r-- 1 ubuntu ubuntu 0 Apr  5 16:48 test_12.tmp
-rw-rw-r-- 1 ubuntu ubuntu 0 Apr  5 16:48 test_13.tmp
-rw-rw-r-- 1 ubuntu ubuntu 0 Apr  5 16:48 test_14.tmp
-rw-rw-r-- 1 ubuntu ubuntu 0 Apr  5 16:48 test_15.tmp
-rw-rw-r-- 1 ubuntu ubuntu 0 Apr  5 16:48 test_16.tmp
-rw-rw-r-- 1 ubuntu ubuntu 0 Apr  5 16:48 test_17.tmp
-rw-rw-r-- 1 ubuntu ubuntu 0 Apr  5 16:48 test_18.tmp
-rw-rw-r-- 1 ubuntu ubuntu 0 Apr  5 16:49 test_19.tmp
-rw-rw-r-- 1 ubuntu ubuntu 0 Apr  5 16:48 test_1.tmp
-rw-rw-r-- 1 ubuntu ubuntu 0 Apr  5 16:49 test_20.tmp
-rw-rw-r-- 1 ubuntu ubuntu 0 Apr  5 16:49 test_21.tmp
-rw-rw-r-- 1 ubuntu ubuntu 0 Apr  5 16:48 test_2.tmp
-rw-rw-r-- 1 ubuntu ubuntu 0 Apr  5 16:48 test_3.tmp
-rw-rw-r-- 1 ubuntu ubuntu 0 Apr  5 16:48 test_4.tmp
-rw-rw-r-- 1 ubuntu ubuntu 0 Apr  5 16:48 test_5.tmp
-rw-rw-r-- 1 ubuntu ubuntu 0 Apr  5 16:48 test_6.tmp
-rw-rw-r-- 1 ubuntu ubuntu 0 Apr  5 16:48 test_7.tmp
-rw-rw-r-- 1 ubuntu ubuntu 0 Apr  5 16:48 test_8.tmp
-rw-rw-r-- 1 ubuntu ubuntu 0 Apr  5 16:48 test_9.tmp
~/test$ ls -lv
total 0
-rw-rw-r-- 1 ubuntu ubuntu 0 Apr  5 16:48 test_1.tmp
-rw-rw-r-- 1 ubuntu ubuntu 0 Apr  5 16:48 test_2.tmp
-rw-rw-r-- 1 ubuntu ubuntu 0 Apr  5 16:48 test_3.tmp
-rw-rw-r-- 1 ubuntu ubuntu 0 Apr  5 16:48 test_4.tmp
-rw-rw-r-- 1 ubuntu ubuntu 0 Apr  5 16:48 test_5.tmp
-rw-rw-r-- 1 ubuntu ubuntu 0 Apr  5 16:48 test_6.tmp
-rw-rw-r-- 1 ubuntu ubuntu 0 Apr  5 16:48 test_7.tmp
-rw-rw-r-- 1 ubuntu ubuntu 0 Apr  5 16:48 test_8.tmp
-rw-rw-r-- 1 ubuntu ubuntu 0 Apr  5 16:48 test_9.tmp
-rw-rw-r-- 1 ubuntu ubuntu 0 Apr  5 16:48 test_10.tmp
-rw-rw-r-- 1 ubuntu ubuntu 0 Apr  5 16:48 test_11.tmp
-rw-rw-r-- 1 ubuntu ubuntu 0 Apr  5 16:48 test_12.tmp
-rw-rw-r-- 1 ubuntu ubuntu 0 Apr  5 16:48 test_13.tmp
-rw-rw-r-- 1 ubuntu ubuntu 0 Apr  5 16:48 test_14.tmp
-rw-rw-r-- 1 ubuntu ubuntu 0 Apr  5 16:48 test_15.tmp
-rw-rw-r-- 1 ubuntu ubuntu 0 Apr  5 16:48 test_16.tmp
-rw-rw-r-- 1 ubuntu ubuntu 0 Apr  5 16:48 test_17.tmp
-rw-rw-r-- 1 ubuntu ubuntu 0 Apr  5 16:48 test_18.tmp
-rw-rw-r-- 1 ubuntu ubuntu 0 Apr  5 16:49 test_19.tmp
-rw-rw-r-- 1 ubuntu ubuntu 0 Apr  5 16:49 test_20.tmp
-rw-rw-r-- 1 ubuntu ubuntu 0 Apr  5 16:49 test_21.tmp
~/test$