Open Ericliu249 opened 1 year ago
cc @lokeshj1703 , can you take a look, seems another duplciate issue: #7724
@danny0405 @lokeshj1703 I had the same problem when using Flink ingest data(one MySQL table) to Hudi with syncing metadata to Hive. Env versions as follows:
I have tried the methods like setting param 'hive_sync.support_timestamp' = 'true'
and indicating the column update_time type as timestamp(6)
, but it still shows up as a bigint value when querying from Hive-Cli.
Expecting a reply 😀
There is a known issue with timestamp type. Would suggest to try out #3391
@Ericliu249 Were you able to try out the suggested path. Are you still facing this issue?
Describe the problem you faced We use Flink to ingest data into Hudi and the Hive Sync Tool to sync metadata to our Hive metastore. A field was declared as
Timestamp
type when writing to Hudi by Flink. The type of the field in thehoodie.properties
is showed aslong
type andtimestamp-micros
logicalType. When we checking in Hive metastore backend db, we can see theTYPE_NAME
for that field is stored asbigint
. Verified that by querying with Trino (returned numbers like1595365357402000
instead of2020-07-22 16:46:11.038 +00:00
format in the parquet files).We tried add
'hoodie.datasource.write.hive_style_partitioning' = 'true'
and'hoodie.datasource.hive_sync.support_timestamp' = 'true'
configs but didn't work.A similar issue to https://github.com/apache/hudi/issues/2509
Expected behavior
The data type in Hive metastore is
timestamp
when Flink writes aTimestamp
field to Hudi.Environment Description
Hudi version : 0.11.1
Flink version : 1.15.3
Hive version : 3.0
Hadoop version :
Storage (HDFS/S3/GCS..) : S3
Running on Docker? (yes/no) : No
Running on Kubernetes? (yes/no) : Yes
Additional context
The timestamp field comes from the Kafka is in the format like
2020-07-22T16:46:11.038532Z
withstring
data type. We used theTO_TIMESTAMP(replace(replace(updated, 'T', ' '), 'Z' ,'' ))
expression to convert it from string the timestamp. The parquet stores the timestamp field in the format like2020-07-22 16:46:11.038 +00:00
. In hoodie.properies, the field's type is:In Hive metastore:
Hudi configs in Flink job
Stacktrace