apache / hudi

Upserts, Deletes And Incremental Processing on Big Data.
https://hudi.apache.org/
Apache License 2.0
5.23k stars 2.39k forks source link

[SUPPORT]flink 写hudi 同步hive后,timestamp字段为什么是bigint类型,如何才能让同步到hive的字段保持timestamp类型 #9766

Closed sunmingqiaa closed 10 months ago

sunmingqiaa commented 10 months ago

Tips before filing an issue

Describe the problem you faced

A clear and concise description of the problem.

To Reproduce

Steps to reproduce the behavior:

1. 2. 3. 4.

Expected behavior

A clear and concise description of what you expected to happen.

Environment Description

Additional context

在https://hudi.apache.org/docs/flink-quick-start-guide#insert-data的测试用例中,给hudi表增加了同步到hive的配置 在flink sql-client查询的timestamp类型字段值是'1970-01-01 00:00:01'这种格式的,但是在hive中查询的该值是bigint类型的数值。 如何才能保持同步到HIve的该类型数据格式保持一致? CREATE TABLE t1( uuid VARCHAR(20), name VARCHAR(10), age INT, ts TIMESTAMP(3), partition VARCHAR(20) ) PARTITIONED BY (partition) WITH ( 'connector' = 'hudi', 'path' = '/tmp/hudi/t1', 'table.type' = 'COPY_ON_WRITE', -- If MERGE_ON_READ, hive query will not have output until the parquet file is generated 'hive_sync.enable' = 'true', -- Required. To enable hive synchronization 'hive_sync.mode' = 'hms', -- Required. Setting hive sync mode to hms, default jdbc 'hive_sync.metastore.uris'= 'thrift://syq-121:9083', 'hive_sync.jdbc_url'= 'jdbc:hive2://syq-121:10000', 'hive_sync.table'= 'test_hudi', 'hive_sync.support_timestamp'= 'true', 'hive_sync.db'= 'default' );

Stacktrace

Add the stacktrace of the error.

danny0405 commented 10 months ago

Before release 0.14.0, there is a sync param hive_sync.support_timestamp, when enabled, the Timestamp(6) type would be synced as TIMESTAMP in hive, since release 0.14.0, all the timestamp type would be synced as TIMESTAMP.

sunmingqiaa commented 10 months ago

Before release 0.14.0, there is a sync param hive_sync.support_timestamp, when enabled, the Timestamp(6) type would be synced as TIMESTAMP in hive, since release 0.14.0, all the timestamp type would be synced as TIMESTAMP.

thanks for your reply. when i use hive_sync.support_timestamp enabled ,in hive the field type is TIMESTAMP indeed. but when i select the value in hive,there is error :Error: java.io.IOException: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.ClassCastException: org.apache.hadoop.io.LongWritable cannot be cast to org.apache.hadoop.hive.serde2.io.TimestampWritable

danny0405 commented 10 months ago

Hive does not recognize it as timestamp correctly ? did you use the release 0.14.0 already?

linrongjun-l commented 9 months ago

Before release 0.14.0, there is a sync param hive_sync.support_timestamp, when enabled, the Timestamp(6) type would be synced as TIMESTAMP in hive, since release 0.14.0, all the timestamp type would be synced as TIMESTAMP.

thanks for your reply. when i use hive_sync.support_timestamp enabled ,in hive the field type is TIMESTAMP indeed. but when i select the value in hive,there is error :Error: java.io.IOException: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.ClassCastException: org.apache.hadoop.io.LongWritable cannot be cast to org.apache.hadoop.hive.serde2.io.TimestampWritable

I also met the same problem, how did you solve it at last?

cumin1 commented 2 months ago

i meet this problem just now , how did you solve this problem?

danny0405 commented 2 months ago

cc @xicm who is very experienced in Hive.

xicm commented 2 months ago

Error: java.io.IOException: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.ClassCastException: org.apache.hadoop.io.LongWritable cannot be cast to org.apache.hadoop.hive.serde2.io.TimestampWritable

@sunmingqiaa can you share the full trace?

xicm commented 2 months ago

As Danny said, 0.14.0 solved this issue.