LibreCat / Catmandu-DBI

A Catmandu::Store plugin for DBI based interfaces
Other
0 stars 1 forks source link

Encoding issues getting data from a MySQL database #30

Closed netsensei closed 5 years ago

netsensei commented 5 years ago

foobar_2018-10-31.txt

Problem

When I try to export data from a MySQL database with UTF-8 encoding, I encounter encoding issues. The value "Venetië, contépotlood" will be output as "Venetië, contépotlood"

How to reproduce.

  1. Import the SQL file I've attached into a MySQL database called "foobar"
  2. Run this command:
catmandu convert DBI --dsn dbi:mysql:foobar --user root --password root --query "SELECT * FROM foobar"
  1. The output in bash will look like this:
    [{"title":"Venetië, contépotlood","id":1}]

More background.

I've tried setting the --encoding flag but that doesn't do anything.

programatically

I'm actually trying to import data from MySQL into SQLite using DBI programmatically in Perl. I've tried setting binmode manually and what-not to no avail. I've also tried setting the encoding flag with something like :encoding(utf8) but that doesn't work either.

Which leaves me doing this wonky code:

$importer->each(sub {
  my $item = shift;
  $item->{title} = decode('UTF-8', $item->{title});
  $exporter->add($item);

I don't think explicitly converting fields is a good practice. I've understood that Perl has something called 'internal encoding' as opposed to external encoding/decoding of the IO. My guess is that somewhere, data is encoded twice in UTF-8 which ends up causing issues.

Any ideas?

nics commented 5 years ago

This is fixed in 4a46217de85c681cdf4c6c2119d045bcf2bd8099, I'll try to make a release later today.