Closed seanknox closed 10 years ago
Programmatically. Here's my quick hack:
require 'charlock_holmes'
detector = CharlockHolmes::EncodingDetector.new
ARGV.each do|f|
content = File.read(f)
detection = detector.detect(content)
puts "#{f} encoding: #{detection[:encoding]}"
utf8_encoded_content = CharlockHolmes::Converter.convert content, detection[:encoding], 'UTF-8'
File.write(f, utf8_encoded_content)
end
That's possible. I'm not sure there's a way to have the transcoder be a bit smarter about characters like that, but I'll look. Think the best way forward is to have the vendor encode as UTF-8 directly.
Thanks for doing that. Did you do this manually or programmatically? Reason I ask, is that our vendor will be pushing updates to FTP that we then automatically add and commit to the repo. I'm seeing if they can just make sure to save with UTF-8 encoding on their end, but just in case, I may need to script this so it doesn't have to be done manually.