Open JingFengWang opened 1 year ago
You are right, the write now only supports UTC timezone, cc @SteNicholas Do you think we should support the local timezone write ?
@JingFengWang Did you try to set up the timestamp of table as type TIMESTAMP_LTZ
type ?
@JingFengWang, you could use TIMESTAMP_LTZ
type to solve the above problem. I have tested that uses TIMESTAMP_LTZ
type and worked well. Meanwhile, I think we could support the local timezone write to help users to avoid this problem.
You are right, the write now only supports UTC timezone, cc @SteNicholas Do you think we should support the local timezone write ?
@JingFengWang Did you try to set up the timestamp of table as type
TIMESTAMP_LTZ
type ?
yes
You are right, the write now only supports UTC timezone, cc @SteNicholas Do you think we should support the local timezone write ?
@JingFengWang Did you try to set up the timestamp of table as type
TIMESTAMP_LTZ
type ?
flink upsert to hudi with timestamp type would use utc timezone,currently can use udf or expand it like https://github.com/apache/flink/pull/23220, so we can keep hudi timezone consistent with hive @danny0405
@JingFengWang, you could use
TIMESTAMP_LTZ
type to solve the above problem. I have tested that usesTIMESTAMP_LTZ
type and worked well. Meanwhile, I think we could support the local timezone write to help users to avoid this problem.
This solution is not applicable in our usage scenario
spark-sql> create table hudi_mor_all_datatype_2 (
> booleanh BOOLEAN,
> inth INT,
> longh LONG,
> floath FLOAT,
> doubleh DOUBLE,
> timestamph TIMESTAMP_LTZ,
> stringh STRING,
> decimalh DECIMAL(3, 2),
> listh ARRAY<INT>,
> structh STRUCT<strg STRING, intg INT>,
> maph MAP<STRING, INT>
> ) using hudi
> tblproperties (
> hoodie.metadata.enable = 'false',
> hoodie.datasource.hive_sync.enable = 'false',
> hoodie.datasource.meta.sync.enable = 'false',
> hoodie.datasource.write.hive_style_partitioning = 'false',
> hoodie.index.type = 'BUCKET',
> hoodie.bucket.index.num.buckets = '61',
> hoodie.bucket.index.max.num.buckets = '127',
> hoodie.bucket.index.min.num.buckets = '31',
> hoodie.bucket.index.merge.threshold = '0.2',
> hoodie.bucket.index.split.threshold = '0.2',
> hoodie.index.bucket.engine = 'SIMPLE',
> hoodie.finalize.write.parallelism = '40',
> hoodie.write.buffer.limit.bytes = '419430400',
> type = 'mor',
> primaryKey = 'inth',
> preCombineField = 'longh'
> )
> partitioned by (stringh);
Error in query:
DataType timestamp_ltz is not supported.(line 7, pos 13)
== SQL ==
create table hudi_mor_all_datatype_2 (
booleanh BOOLEAN,
inth INT,
longh LONG,
floath FLOAT,
doubleh DOUBLE,
timestamph TIMESTAMP_LTZ,
-------------^^^
stringh STRING,
decimalh DECIMAL(3, 2),
listh ARRAY
spark-sql>
@JingFengWang Does spark assume local datatime by default?
@JingFengWang Does spark assume local datatime by default?
Yes, Spark does not use UTC time zone
Fine, we may need to consider a general solution for both Flink read/writer of timestamp.
Fine, we may need to consider a general solution for both Flink read/writer of timestamp.
OK, thanks!
Tips before filing an issue hudi 0.14.0 hudi-flink-bundle The COW/MOR table type writes timestamp data, and the time zone for writing data when read.utc-timezone=false is set is still the UTC time zone. AvroToRowDataConverters and RowDataToAvroConverters timestamp time zone conversion is hardcoded to UTC time zone.
Describe the problem you faced
To Reproduce
Steps to reproduce the behavior:
Expected behavior
hudi-flink1.13-bundle supports writing timestamps in non-UTC time zones in a configurable way
Environment Description
Hudi version : 0.14.0
Spark version : 3.2.0
Flink version: 1.13.2
Hive version : 1.11.1
Hadoop version : 3.x
Storage (HDFS/S3/GCS..) : HDFS
Running on Docker? (yes/no) : no
Related code location