apache / hudi

Upserts, Deletes And Incremental Processing on Big Data.
https://hudi.apache.org/
Apache License 2.0
5.4k stars 2.43k forks source link

[SUPPORT]flink-sql write hudi use TIMESTAMP, when hive query, it get time+8h question, use TIMESTAMP_LTZ, the hive schema is bigint but timestamp #9864

Open li-ang-666 opened 1 year ago

li-ang-666 commented 1 year ago

when I use sql with timestamp:

*CREATE TABLE dwd_ratio_path_company( id DECIMAL(20, 0), company_id BIGINT, shareholder_id STRING, shareholder_entity_type SMALLINT, shareholder_name_id BIGINT, investment_ratio_total DECIMAL(24, 12), is_controller SMALLINT, is_ultimate SMALLINT, is_big_shareholder SMALLINT, is_controlling_shareholder SMALLINT, equity_holding_path STRING, create_time TIMESTAMP(3), update_time TIMESTAMP(3), is_deleted SMALLINT, op_ts TIMESTAMP(3), PRIMARY KEY (id) NOT ENFORCED ) WITH ( 'connector' = 'hudi', 'path' = 'obs://hadoop-obs/hudi_ods/ratio_path_company015', 'table.type' = 'MERGE_ON_READ', -- index 'index.type' = 'BUCKET', 'hoodie.bucket.index.num.buckets' = '128', -- write 'write.tasks' = '4', 'write.task.max.size' = '512', 'write.batch.size' = '8', 'write.log_block.size' = '64', 'write.precombine' = 'true', 'write.precombine.field' = 'op_ts', -- compaction 'compaction.async.enabled' = 'true', 'compaction.delta_commits' = '3', -- clean 'clean.retain_commits' = '0', -- hive 'hive_sync.enabled' = 'true', 'hive_sync.db' = 'hudi_ods', 'hive_sync.table' = 'ratio_path_company015', 'hive_sync.metastore.uris' = 'thrift://10.99.202.153:9083', 'hive_sync.mode' = 'hms', 'hive_sync.support_timestamp' = 'true' ); insert into dwd_ratio_path_company select from ods_ratio_path_company;**

it appear:

image image
li-ang-666 commented 1 year ago

if i use sql with timestamp_ltz:

*CREATE TABLE dwd_ratio_path_company( id DECIMAL(20, 0), company_id BIGINT, shareholder_id STRING, shareholder_entity_type SMALLINT, shareholder_name_id BIGINT, investment_ratio_total DECIMAL(24, 12), is_controller SMALLINT, is_ultimate SMALLINT, is_big_shareholder SMALLINT, is_controlling_shareholder SMALLINT, equity_holding_path STRING, create_time TIMESTAMP_LTZ(3), update_time TIMESTAMP_LTZ(3), is_deleted SMALLINT, op_ts TIMESTAMP_LTZ(3), PRIMARY KEY (id) NOT ENFORCED ) WITH ( 'connector' = 'hudi', 'path' = 'obs://hadoop-obs/hudi_ods/ratio_path_company016', 'table.type' = 'MERGE_ON_READ', -- index 'index.type' = 'BUCKET', 'hoodie.bucket.index.num.buckets' = '128', -- write 'write.tasks' = '4', 'write.task.max.size' = '512', 'write.batch.size' = '8', 'write.log_block.size' = '64', 'write.precombine' = 'true', 'write.precombine.field' = 'op_ts', -- compaction 'compaction.async.enabled' = 'true', 'compaction.delta_commits' = '2', -- clean 'clean.retain_commits' = '0', -- hive 'hive_sync.enabled' = 'true', 'hive_sync.db' = 'hudi_ods', 'hive_sync.table' = 'ratio_path_company016', 'hive_sync.metastore.uris' = 'thrift://10.99.202.153:9083', 'hive_sync.mode' = 'hms', 'hive_sync.support_timestamp' = 'true' ); insert into dwd_ratio_path_company select from ods_ratio_path_company;**

it appear:

image image
li-ang-666 commented 1 year ago

now I solved this by: select CAST(CONVERT_TZ(CAST(op_ts AS STRING), 'Asia/Shanghai', 'UTC') AS TIMESTAMP(3)) op_ts

danny0405 commented 1 year ago

can you share your hudi version info

li-ang-666 commented 1 year ago

can you share your hudi version info

hive-3.1.1 hudi-0.14.0 flink-1.17.1 cdc-2.4 flink-mysql-connector - 3.1.1-1.17

danny0405 commented 1 year ago

Since release 0.14.0, both timestamp(3) and timestamp(6) would be synced as TIMESTAMP type in Hive.

li-ang-666 commented 1 year ago

Since release 0.14.0, both timestamp(3) and timestamp(6) would be synced as TIMESTAMP type in Hive.

but it used UTC timezone, and TIMESTAMP_LTZ in hive will be bigint

linrongjun-l commented 1 year ago

Since release 0.14.0, both timestamp(3) and timestamp(6) would be synced as TIMESTAMP type in Hive.

After I use 0.13.0 timestamp(6) hudi to synchronize to hive, hive is a TIMESTAMP type, but TIMESTAMP cannot be changed to long in hive query times. Have you ever encountered this? How was it resolved?

danny0405 commented 1 year ago

but TIMESTAMP cannot be changed to long

What do you mean by changed to long?

linrongjun-l commented 1 year ago

but TIMESTAMP cannot be changed to long

What do you mean by changed to long?

The following error occurred during the query: Error: java.io.IOException: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.ClassCastException: org.apache.hadoop.io.LongWritable cannot be cast to org.apache.hadoop.hive.serde2.io.TimestampWritableV2 (state=,code=0).

hudi version: 0.13.1 hive version: 3.1.1

danny0405 commented 1 year ago

Did you ever change the schema of Hive, did you ever use long as the type then?

GaoYaokun commented 1 year ago

I also encountered this issue when I used Flink CDC to write data from MySQL to Hudi and synchronize Hive. The Timestamp(6) field in Hive correctly displayed as Timestamp. But when I use Hive to query it, an error will be reported like this:

SQL ERROR: java.io.IOException: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.ClassCastException: org.apache.hadoop.hive.serde2.io.TimestampWritable cannot be cast to org.apache.hadoop.hive.serde2.io.TimestampWritableV2

And I don't change any schema of Hive. This Synchronized Hive table is a new table. How to solve this problem?

hudi version: 0.13.1 flink version 1.16.1 hive version 3.1.2

danny0405 commented 1 year ago

@xicm can you help with this issue?

GaoYaokun commented 12 months ago

I saw #8867 and switched Hudi version to 0.14.0. Then the timestamp and date field synchronized hive correctly. Querying the hudi table through hive is also correct.

hudi version: 0.14.0 flink version 1.16.1 hive version 3.1.2