brianmario / charlock_holmes

Character encoding detection, brought to you by ICU
MIT License
1.04k stars 142 forks source link

When trying to import some files to postgresql that I've run conversion on I get errors... #77

Open Altonymous opened 10 years ago

Altonymous commented 10 years ago

Here are a couple of example errors...

invalid byte sequence for encoding "UTF8": 0xba
invalid byte sequence for encoding "UTF8": 0xd0 0x34

Here's how I do the re-encoding...

def convert_file_to_utf8(file_path)
  contents = File.read(file_path)
  detection = CharlockHolmes::EncodingDetector.detect(contents)
  utf8_encoded_content = CharlockHolmes::Converter.convert(contents, detection[:encoding], 'UTF-8')

  return utf8_encoded_content
end

Am I doing something wrong, is the gem not accounting for some of the characters in the file correctly, or is it something else entirely?

Altonymous commented 10 years ago

Any ideas?