documentcloud / docsplit

Break Apart Documents into Images, Text, Pages and PDFs
http://documentcloud.github.com/docsplit/
Other
832 stars 214 forks source link

Accept non-ascii characters in pdf headers #65

Closed amalagaura closed 11 years ago

amalagaura commented 11 years ago

Previous pull requests were not working for me. Googling led me to stack overflow http://stackoverflow.com/a/8873922/234125

for a working solution Update lib/docsplit/info_extractor.rb

mateusmaso commented 11 years ago

+1

sandstrom commented 11 years ago

Nice solution, this would be helpful.

knowtheory commented 11 years ago

Cool, i've committed a failing test case, which this pull request fixes. Next up patching the text cleaner to conditionally use Iconv.

sandstrom commented 11 years ago

Awesome :cocktail: