EnterpriseDB / mysql_fdw

PostgreSQL foreign data wrapper for MySQL
Other
530 stars 162 forks source link

cannot convert emoji from PG to mysql #133

Open corama opened 7 years ago

corama commented 7 years ago

While using insert into foreign tables select from local ones to transfer data from PG to mysql, all emoji turned to be messy code. Cannot sure it is a issue or something wrong with mysql steps. Need professors' help.

gabbasb commented 7 years ago

Could you please be kind enough to provide us the failing test case?

furplag commented 3 years ago

Hi, I also have the same issue . Oracle Linux 8.2 PostgreSQL 13 ( UTF8 ) MariaDB 10 ( utf8mb4 ) mysql_fdw REL-2_6_0

-- MySQL
CREATE TABLE `test_emoji` (
  `id` int(11) NOT NULL PRIMARY KEY,
  `emoji` text NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;

insert into test_emoji values (1, '⚡');
insert into test_emoji values (2, '👺');
select * from test_emoji;
the result is id emoji
1
2 👺

and then

-- PostgreSQL
create extension mysql_fdw;
create server mysql_server foreign data wrapper mysql_fdw options (host 'localhost', port '3306');
create user mapping for test server mysql_server options (user 'emoji', password '********');
import foreign schema emoji from server mysql_server into public;

select * from test_emoji;
the result is id emoji
1
2 👺

but,

insert into test_emoji values (3, '⚡');
insert into test_emoji values (4, '👺');
the result is id emoji
1
2 👺
3
4 ????

would you like to tell me any idea ?

surajkharage19 commented 3 years ago

Hi,

Can you please share the collation details from your end? I am getting below error while inserting into test_emoji table with utf8mb4 as character set. Do I need to perform any additional details regarding the same?

MariaDB [suraj1]> SELECT @@character_set_database, @@collation_database;
+--------------------------+----------------------+
| @@character_set_database | @@collation_database |
+--------------------------+----------------------+
| utf8mb4                  | utf8mb4_unicode_ci   |
+--------------------------+----------------------+
1 row in set (0.000 sec)

MariaDB [suraj1]> CREATE TABLE `test_emoji` (
    ->   `id` int(11) NOT NULL PRIMARY KEY,
    ->   `emoji` text NOT NULL
    -> ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;
Query OK, 0 rows affected (0.017 sec)

MariaDB [suraj1]> 
MariaDB [suraj1]> insert into test_emoji values (1, '⚡');
Query OK, 1 row affected (0.001 sec)

MariaDB [suraj1]> insert into test_emoji values (2, '👺');
ERROR 1366 (22007): Incorrect string value: '\xF0\x9F\x91\xBA' for column `suraj1`.`test_emoji`.`emoji` at row 1
furplag commented 3 years ago

thank you for reply, the collation of my database is,

MariaDB [origin]>  show variables like 'coll%';
+----------------------+--------------------+
| Variable_name        | Value              |
+----------------------+--------------------+
| collation_connection | utf8mb4_general_ci |
| collation_database   | utf8mb4_bin        |
| collation_server     | utf8mb4_bin        |
+----------------------+--------------------+
3 rows in set (0.001 sec)

MariaDB [origin]> SELECT @@character_set_database, @@collation_database;
+--------------------------+----------------------+
| @@character_set_database | @@collation_database |
+--------------------------+----------------------+
| utf8mb4                  | utf8mb4_bin          |
+--------------------------+----------------------+
1 row in set (0.000 sec)

MariaDB [origin]> -- MySQL
MariaDB [origin]> CREATE TABLE `test_emoji` (
    ->   `id` int(11) NOT NULL PRIMARY KEY,
    ->   `emoji` text NOT NULL
    -> ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;
Query OK, 0 rows affected (0.029 sec)

MariaDB [origin]> insert into test_emoji values (1, '⚡');
Query OK, 1 row affected (0.005 sec)

MariaDB [origin]> insert into test_emoji values (2, '👺');
Query OK, 1 row affected (0.011 sec)

MariaDB [origin]> select * from test_emoji;
+----+-------+
| id | emoji |
+----+-------+
|  1 | ⚡     |
|  2 | 👺     |
+----+-------+
2 rows in set (0.010 sec)

character set and collation are specified by /etc/my.cnf.d/server.cnf .

[server]

[mysqld]
datadir=/var/lib/mysql
socket=/var/lib/mysql/mysql.sock
log-error=/var/log/mariadb/mariadb.log

pid-file=/run/mariadb/mariadb.pid

slow_query_log=on
slow_query_log_file=/var/log/mariadb/mariadb.slow.log
long_query_time=5

lower_case_table_names=1
character-set-server=utf8mb4
collation-server=utf8mb4_bin

[client]
default-character-set=utf8mb4

[mariadb]
performance_schema=on

[mariadb-client]

[mariadb-10.3]
surajkharage19 commented 3 years ago

Thanks for the details @furplag . After changing collation_database and collation_server to utf8mb4_bin, I am able to reproduce the issue.

edb=# create foreign table test_emoji (id int, emoji text) server mysql_svr options (dbname 'suraj1', table_name 'test_emoji');
CREATE FOREIGN TABLE
edb=# 
edb=# select * from test_emoji;
 id | emoji 
----+-------
  1 | ⚡
  2 | ?
(2 rows)

edb=# 
edb=# insert into test_emoji values (3, '⚡');
INSERT 0 1
edb=# 
edb=# insert into test_emoji values (4, '👺');
INSERT 0 1
edb=# 
edb=# select * from test_emoji;);
 id | emoji 
----+-------
  1 | ⚡
  2 | ?
  3 | ⚡
  4 | ????
(4 rows)

After an initial analysis, it seems like it is failing because of character-set issue. While making the connection to MySQL, we set character-set same as current Postgres database like below which is "UTF8" in our case:

mysql_options(conn, MYSQL_SET_CHARSET_NAME, GetDatabaseEncodingName());

but it seems like MySQL expects "utf8mb4" character set here so that the above emojis should be inserted and fetched correctly. After changing that value to hard-coded "utf8mb4", I am able to fetch and insert that emoji correctly.

e.g: mysql_options(conn, MYSQL_SET_CHARSET_NAME, "utf8mb4");

These changes are done in the mysql_connect() function which is in connection.c file.

We might need to give an option to the user to set the character-set but that needs further study on this area. We will look into this further and try to come up with a solution after doing further study.

If you could compile and install the mysql_fdw using source code then you can make the above-mentioned changes to get this working and share your feedback.

Thanks for reporting this issue.

furplag commented 3 years ago

Thank you for analysis and suggestion @surajkharage19, I try to change GetDatabaseEncodingName() using source code .

Regards, Furplag

furplag commented 3 years ago

I tried this solution these steps, it works fine ( at least in this case ) .

  1. define new option "character_set" in options .
  2. and set character-set using mysql_options(), if the option is specified .
    /* oracle_fdw.h:115 */
    char     *character_set;    /* MySQL charcter set */
    } mysql_opt;
    /* option.c:59 */
    {"character_set", ForeignServerRelationId},
    /* Sentinel */
    {NULL, InvalidOid}
        if (strcmp(def->defname, "ssl_cipher") == 0)
            opt->ssl_cipher = defGetString(def);
    }
    ...
    /* option.c:263 */
        if (strcmp(def->defname, "character_set") == 0)
            opt->character_set = defGetString(def);
    /* connection.c: 236 */
    mysql_options(conn, MYSQL_SET_CHARSET_NAME, opt->character_set?:GetDatabaseEncodingName());

Thank you for your advice .