apache / seatunnel

SeaTunnel is a next-generation super high-performance, distributed, massive data integration tool.
https://seatunnel.apache.org/
Apache License 2.0
7.77k stars 1.74k forks source link

[Bug] [Zeta] Synchronize data from Starrocks to Hive Chinese field garbled code #6636

Closed matianhe3 closed 2 months ago

matianhe3 commented 5 months ago

Search before asking

What happened

chinese become ???, maybe same as #5244

广东 -> ??

but run many times, it maybe right 广东. it is random.

SeaTunnel Version

2.3.4

SeaTunnel Config

env {
  execution.parallelism = 1
  job.mode = BATCH
  job.name = test
}

source {
  Hive {
    table_name = "bdm.test"
    metastore_uri = "thrift://192.168.16.9:9083"
    hdfs_site_path = "/opt/apache-seatunnel-2.3.4/config/hdfs-site.xml"
    hive_site_path = "/opt/apache-seatunnel-2.3.4/config/hive-site.xml"
    result_table_name = source
    read_partitions = ["dt="${dt}]
    delimiter = ","
  }
}

transform {
}

sink {
  StarRocks {
    source_table_name = [source]
    nodeUrls = ["192.168.17.203:8030", "192.168.17.204:8030", "192.168.17.205:8030"]
    base-url = "jdbc:mysql:loadBalance://192.168.17.203:9030,192.168.17.204:9030,192.168.17.205:9030"
    username = test
    password = ""
    database = ods
    table = test
    enable_upsert_delete = true
  }
}

Running Command

seatunnel.sh -c test.conf

Error Exception

Starrocks ddl
CREATE TABLE `test` (
  `uniqueid` varchar(64) NOT NULL COMMENT "",
  `dt` date NOT NULL COMMENT "",
  `customerprovince` varchar(16) NULL COMMENT "",
  `customercity` varchar(32) NULL COMMENT ""
) ENGINE=OLAP 
PRIMARY KEY(`uniqueid`, `dt`)
PARTITION BY (`dt`)
DISTRIBUTED BY HASH(`uniqueid`)
PROPERTIES (
"replication_num" = "3",
"in_memory" = "false",
"enable_persistent_index" = "true",
"replicated_storage" = "true",
"compression" = "LZ4"
);
Hive DDL

CREATE EXTERNAL TABLE if not exists `bdm`.`test`
(
    uniqueid string comment "",
    customerprovince string comment "",
    customercity string comment ""
)
   partitioned by (dt string comment '')
   ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
;


### Zeta or Flink or Spark Version

Zeta 2.3.4

### Java or Scala Version

_No response_

### Screenshots

_No response_

### Are you willing to submit PR?

- [ ] Yes I am willing to submit a PR!

### Code of Conduct

- [X] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct)
Carl-Zhou-CN commented 4 months ago

@zhilinli123 hi,Please continue to help check link https://github.com/apache/seatunnel/issues/5244