Closed stuartf closed 11 years ago
This didn't make any difference for me on Ruby 1.9.2 - it still doesn't like ®. Stripping them with Iconv works, but obviously loses some data.
hmm, I was testing on 1.9.3, I didn't think there would be that much difference from 1.9.2...
How bout adding
result.encode('UTF-8', :invalid => :replace, :replace => '').encode('UTF-8')
just below
result = `#{cmd}`.chomp
on line 22 in info_extractor.rb
Yeah, this didn't work for me either. Any ideas?
This pull request and suggestions were not working, i have a pull request with code that is working for me https://github.com/documentcloud/docsplit/pull/65
Closing this as we've merged #65
an alternative way to handle non-ascii chars in pdf headers, probably not backwards compatible to ruby 1.8