documentcloud / docsplit

Break Apart Documents into Images, Text, Pages and PDFs
http://documentcloud.github.com/docsplit/
Other
833 stars 214 forks source link

Percent sign in filenames isn't escaped properly #129

Open jeremybmerrill opened 9 years ago

jeremybmerrill commented 9 years ago
> require 'docsplit'
> Docsplit.extract_pages('/path/to/whatever/a pdf is bad, 100%.pdf')
Docsplit::ExtractionFailed: Exception in thread "main" java.util.UnknownFormatConversionException: Conversion = '_'
  at java.util.Formatter.checkText(Formatter.java:2547)
  at java.util.Formatter.parse(Formatter.java:2523)
  at java.util.Formatter.format(Formatter.java:2469)
  at java.util.Formatter.format(Formatter.java:2423)
  at java.lang.String.format(String.java:2797)
  at org.documentcloud.pdftailor.PdfTailor.outputPath(PdfTailor.java:156)
  at org.documentcloud.pdftailor.PdfTailor.unstitch(PdfTailor.java:128)
  at org.documentcloud.pdftailor.PdfTailor.main(PdfTailor.java:43)
  from /Users/myusername/.rbenv/versions/2.1.4/lib/ruby/gems/2.1.0/gems/docsplit-0.7.6/lib/docsplit/page_extractor.rb:22:in `block in extract'
  from /Users/myusername/.rbenv/versions/2.1.4/lib/ruby/gems/2.1.0/gems/docsplit-0.7.6/lib/docsplit/page_extractor.rb:10:in `each'
  from /Users/myusername/.rbenv/versions/2.1.4/lib/ruby/gems/2.1.0/gems/docsplit-0.7.6/lib/docsplit/page_extractor.rb:10:in `extract'
  from /Users/myusername/.rbenv/versions/2.1.4/lib/ruby/gems/2.1.0/gems/docsplit-0.7.6/lib/docsplit.rb:46:in `extract_pages'
  from (irb):3
  from /Users/myusername/.rbenv/versions/2.1.4/bin/irb:11:in `<main>'

If you try different filenames with % in different places, the error message is slightly different: presumably Java is trying to interpret the % differently based on preceding/following characters.