edgelesssys / edgelessdb

EdgelessDB is a MySQL-compatible database for confidential computing. It runs entirely inside a secure enclave and comes with advanced features for collaboration, recovery, and access control.
https://edgeless.systems/products/edgelessdb
GNU General Public License v2.0
170 stars 17 forks source link

EdgelessDB Chinese garbled problem #86

Closed water5-cmd closed 2 years ago

water5-cmd commented 2 years ago

Hi, @thomasten, I have been able to manipulate the edgelessdb inside the enclave, such as adding, deleting, and querying. Now I met a garble problem: query Chinese from edgelessdb database, printing will be garbled. For example: Insert one data to the usertest table of mysql database

DROP TABLE IF EXISTS `usertest`;
CREATE TABLE `usertest`  (
  `id` int(11) NOT NULL AUTO_INCREMENT COMMENT '管理员ID',
  `login_name` varchar(40) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NOT NULL COMMENT '登录名',
  `login_pwd` varchar(40) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NOT NULL COMMENT '登录密码',
  `name` varchar(1024) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NULL DEFAULT NULL COMMENT '姓名',
  PRIMARY KEY (`id`) USING BTREE
) AUTO_INCREMENT = 29 CHARACTER SET = utf8mb4 COLLATE = utf8mb4_general_ci COMMENT = '用户表' ROW_FORMAT = Dynamic;

sql statement:

INSERT INTO `usertest` VALUES (1, 'jack62', '1bbd886460827015e5d605ed44252251', '你好小明');

Operating usertest with MySQL client under REE, reading and displaying data is normal.

mysql> select * from usertest;
+----+------------+----------------------------------+--------------+
| id | login_name | login_pwd                        | name         |
+----+------------+----------------------------------+--------------+
|  1 | jack62     | 1bbd886460827015e5d605ed44252251 | 你好小明     |
+----+------------+----------------------------------+--------------+
1 row in set (0.00 sec)

When operating usertest inside the enclave, the Chinese "你好小明" data will be garbled, the result of print name using fmt.Println is like

"name":"甘肃å«è®¡å§”"

From here, I know in MariaDB, the default character set is latin1, and the default collation is latin1_swedish_ci. Here is the information in edgelessdb.

mysql> SHOW VARIABLES LIKE 'character%';
+--------------------------+----------------------------------+
| Variable_name            | Value                            |
+--------------------------+----------------------------------+
| character_set_client     | latin1                           |
| character_set_connection | latin1                           |
| character_set_database   | latin1                           |
| character_set_filesystem | binary                           |
| character_set_results    | latin1                           |
| character_set_server     | latin1                           |
| character_set_system     | utf8                             |
| character_sets_dir       | /usr/local/mysql/share/charsets/ |
+--------------------------+----------------------------------+
8 rows in set (0.01 sec)
mysql> SHOW VARIABLES like "%collation%";
+---------------------------------------+-------------------+
| Variable_name                         | Value             |
+---------------------------------------+-------------------+
| collation_connection                  | latin1_swedish_ci |
| collation_database                    | latin1_swedish_ci |
| collation_server                      | latin1_swedish_ci |
| rocksdb_error_on_suboptimal_collation | ON                |
| rocksdb_strict_collation_check        | ON                |
| rocksdb_strict_collation_exceptions   |                   |
+---------------------------------------+-------------------+
6 rows in set (0.00 sec)

But I want to use utf8mb4 or utf8mb4, so I changed the configuration, like this

mysql> set character_set_client = 'utf8mb4';
Query OK, 0 rows affected (0.00 sec)

mysql> set character_set_connection = 'utf8mb4';
Query OK, 0 rows affected (0.00 sec)

mysql> set character_set_server = 'utf8mb4';
Query OK, 0 rows affected (0.00 sec)

mysql> set character_set_results = 'utf8mb4';
Query OK, 0 rows affected (0.00 sec)

mysql> set character_set_database = 'utf8mb4';
Query OK, 0 rows affected (0.00 sec)

mysql> SHOW VARIABLES LIKE 'character%';
+--------------------------+----------------------------------+
| Variable_name            | Value                            |
+--------------------------+----------------------------------+
| character_set_client     | utf8mb4                          |
| character_set_connection | utf8mb4                          |
| character_set_database   | utf8mb4                          |
| character_set_filesystem | binary                           |
| character_set_results    | utf8mb4                          |
| character_set_server     | utf8mb4                          |
| character_set_system     | utf8                             |
| character_sets_dir       | /usr/local/mysql/share/charsets/ |
+--------------------------+----------------------------------+
8 rows in set (0.01 sec)

mysql> SHOW VARIABLES like "%collation%";
+---------------------------------------+--------------------+
| Variable_name                         | Value              |
+---------------------------------------+--------------------+
| collation_connection                  | utf8mb4_general_ci |
| collation_database                    | utf8mb4_general_ci |
| collation_server                      | utf8mb4_general_ci |
| rocksdb_error_on_suboptimal_collation | ON                 |
| rocksdb_strict_collation_check        | ON                 |
| rocksdb_strict_collation_exceptions   |                    |
+---------------------------------------+--------------------+
6 rows in set (0.01 sec)

But the result of printing Chinese characters is still garbled. What should I do to solve the problem of printing messy Chinese characters? About modifying the latin1 to utf8mb4 of edgelessdb, I add set character_set_database = 'utf8mb4'; in manifest.json or in mariadbbootstrap.go are useless, only under REE using MySQL client can modify the latin1 to utf8mb4, is it possible to change these configurations in the code?

water5-cmd commented 2 years ago

I changed CHARACTER SET = utf8mb4 COLLATE = utf8mb4_general_ci to CHARACTER SET = utf8 COLLATE = utf8_general_ci, the garbled problem solved. So, does EdgelessDB support utf8mb4?

thomasten commented 2 years ago

Hi, We can reproduce the issue and will investigate.

thomasten commented 2 years ago

Turns out this isn't a bug. It can be changed using set global:

set global character_set_client = 'utf8mb4';
set global character_set_connection = 'utf8mb4';
set global character_set_server = 'utf8mb4';
set global character_set_results = 'utf8mb4';
set global character_set_database = 'utf8mb4';
quit

Then reconnect and it should work.

However, we changed the default charset of EdgelessDB to utf8mb4 in the current master branch. This seems to be more reasonable.

thomasten commented 2 years ago

Starting with v0.3.0, the default charset is utf8mb4.