allwefantasy / spark-binlog

A library for querying Binlog with Apache Spark structure streaming, for Spark SQL , DataFrames and [MLSQL](https://www.mlsql.tech).
Apache License 2.0
154 stars 54 forks source link

Chinese cannot be interpreted correctly #21

Closed zhengqiangtan closed 4 years ago

zhengqiangtan commented 4 years ago

Excuse me ,when I sync the binlog data from mysql , Chinese cannot be interpreted correctly on the production environment

1、create table sql like this:

---mysql env
show variables like 'character%';
character_set_client    utf8
character_set_connection    utf8
character_set_database  utf8mb4
character_set_filesystem    binary
character_set_results   utf8
character_set_server    utf8mb4
character_set_system    utf8
character_sets_dir  

CREATE TABLE `users` (
  `id` int(11) unsigned NOT NULL AUTO_INCREMENT,
  `email` varchar(255) CHARACTER SET utf8 COLLATE utf8_unicode_ci DEFAULT '',
  `mobile` varchar(20) CHARACTER SET utf8 COLLATE utf8_unicode_ci NOT NULL DEFAULT '',
  `first_name` varchar(255) CHARACTER SET utf8 COLLATE utf8_unicode_ci DEFAULT NULL
) ENGINE=InnoDB AUTO_INCREMENT=23 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;

2、when I query the delta table show as follow :

+----------+
|first_name|
+----------+
|��������� |
+----------+

Note: MySQL table displays normally, no matter the revision of history, or the newly inserted Chinese all show garbled code, what should I do?

allwefantasy commented 4 years ago

Fixed this issue in fix-charset