documentcloud / docsplit

Break Apart Documents into Images, Text, Pages and PDFs
http://documentcloud.github.io/docsplit/
Other
831 stars 214 forks source link

Command line docsplit displays pdftotext usage when inputing a PDF filename that has spaces #14

Closed matthewmueller closed 13 years ago

matthewmueller commented 13 years ago

So this is a pretty minor issue, but when I run something like this on the command line:

docsplit text ZUJI\ Hong\ Kong\:\ Your\ Online\ Travel\ Guru.pdf

It's going to spit out:

pdftotext version 0.16.7
Copyright 2005-2011 The Poppler Developers - http://poppler.freedesktop.org
Copyright 1996-2004 Glyph & Cog, LLC
Usage: pdftotext [options] <PDF-file> [<text-file>]
  -f <int>          : first page to convert
  -l <int>          : last page to convert
  -r <fp>           : resolution, in DPI (default is 72)
  -x <int>          : x-coordinate of the crop area top left corner
  -y <int>          : y-coordinate of the crop area top left corner
  -W <int>          : width of crop area in pixels (default is 0)
  -H <int>          : height of crop area in pixels (default is 0)
  -layout           : maintain original physical layout
  -raw              : keep strings in content stream order
  -htmlmeta         : generate a simple HTML file, including the meta information
  -enc <string>     : output text encoding name
  -listenc          : list available encodings
  -eol <string>     : output end-of-line convention (unix, dos, or mac)
  -nopgbrk          : don't insert page breaks between pages
  -bbox             : output bounding box for each word and page size to html.  Sets -htmlmeta
  -opw <string>     : owner password (for encrypted files)
  -upw <string>     : user password (for encrypted files)
  -q                : don't print any messages or errors
  -v                : print copyright and version info
  -h                : print usage information
  -help             : print usage information
  --help            : print usage information
  -?                : print usage information

As I said a minor issue, but it appears that any PDF filenames that have spaces (ebooks) will need to be changed before running this command in the terminal.

vrybas commented 13 years ago

That's the issue for 0.5.2, but the fix is already in 'master'. Use

 gem 'docsplit', :ref => '2d03ce9',  :git => 'git://github.com/documentcloud/docsplit.git'

in your Gemfile for now.

jashkenas commented 13 years ago

Yep -- hoping to push out a new release some time this afternoon.