EndPointCorp / end-point-blog

End Point Dev blog
https://www.endpointdev.com/blog/
17 stars 65 forks source link

Comments for DBD::Pg UTF-8 for PostgreSQL server_encoding #471

Open phinjensen opened 6 years ago

phinjensen commented 6 years ago

Comments for https://www.endpointdev.com/blog/2011/06/dbdpg-utf-8-for-postgresql/ By Greg Sabino Mullane

To enter a comment:

  1. Log in to GitHub
  2. Leave a comment on this issue.
gurucubano commented 4 years ago

Hello, I have a database wherein all char columns are UTF-8, for exmaple this (used later in the Perl codes as well):

printf  "select d02name::bytea from d02ben where d02bnr = '00001048313' ;\n" | psql -Usisis -dpos71 -t
\x50c3a46461676f67697363686520486f6368736368756c65205765696e67617274656e2020202020
    ^^^^ German Umlaut ä in UTF-8

The used Perl code is:

#!/usr/local/bin/perl

my $PGDB      = 'dbi:Pg:dbname=pos71;host=127.0.0.1';
my $PGDB_USER = 'sisis';
my $PGDB_PASS = 'sisis123';

use DBD::Pg;

sub onevaluesql($) {
  my ($command) = @_;
  my $sth=$dbh->prepare($command);
  $sth->execute();
  my ($value) = $sth->fetchrow_array();
  $sth->finish();
  return $value;
}

$dbh = DBI->connect($PGDB, $PGDB_USER, $PGDB_PASS,
    { pg_utf8_strings => 1,
      AutoCommit => 0,
      RaiseError => 0,
      PrintError => 0,
    }
    )
    || die "Couldn't connect to $PGDB as user $PGDB_USER: $DBI::errstr\n";

$dbh->do("SET client_encoding TO UTF8");

print "DBI is version $DBI::VERSION, DBD::Pg is version $DBD::Pg::VERSION\n";
print "client_encoding=" . onevaluesql("show client_encoding;") . ", server_encoding=" . onevaluesql("show server_encoding;") . "\n";

$sth=$dbh->prepare(
  "select d02name from d02ben where d02bnr = '00001048313'")
                or die "parse error\n".${DBI::errstr}."\n";

$sth->execute
                or die "exec error\n".${DBI::errstr}."\n";

while ( (my @row = $sth->fetchrow_array) )  {
   my $HexStr = unpack("H*", $row[0]);
   print "HexStr: " . $HexStr . "\n";
   print "$row[0]\n";
}

$sth->finish;

And when I run this, the above column comes out coded in ISO-8859-1, note the hex string 50e4... :

./utf8-01.pl
DBI is version 1.642, DBD::Pg is version 3.8.0
client_encoding=UTF8, server_encoding=UTF8
HexStr: 50e46461676f67697363686520486f6368736368756c65205765696e67617274656e2020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020
Pagogische Hochschule Weingarten

I watched with the Linux strace the communicatio between the client (DBD::Pg) and the PostgreSQL server 11.4, and the server delivers the string in question as UTF-8 to the client:

...
recvfrom(3, "T\0\0\0 \0\1d02name\0\0\1\313\237\0\3\0\0\4\22\377\377\0\0\0|\0\0D\0\0\0\203\0\1\0\0\0yP\303\244dagogische Hochschule Weingarten                                                                                      C\0\0\0\rSELECT 1\0Z\0\0\0\5T", 16384, 0, NULL, NULL) = 185
write(1, "HexStr: 50e46461676f67697363686520486f6368736368756c65205765696e67617274656e2020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020\n", 249) = 249
write(1, "P\344dagogische Hochschule Weingarten                                                                                      \n", 121) = 121

What I do wrong? Thanks.