[SUPPORT] timestamp with logical type is timestamp-mills will cause data inconsistencies

Describe the problem you faced

a table with col ts type is timestamp and it is a precombineKey

background： flink streaming load and spark will sync to hive partitioned table every day.

question: when use spark to query the table, the result show ts is 55758-12-02 03:30:01.0, and if I use spark to query the table to sync other hive table, the data update record will lose, the new data has been load into log file, but the hive table only contain old value after sync. After compact, if I sync to hive again, the result is correct.

analysis:

commit instance, hoodie.properties all of them logical type are timestamp-mills
in spark code, when convert structType to avroType unable to distinguish accuracy type, will use timestamp-micros
so, when use spark mergeingfileIterator, base file use timestamp-micros, logfile use timestamp-mills, because avroschemastr is timestamp-mills

so, if ts long value is 1697609536683, base file will get 1697609536683000, log file is 1697609536683.

the spark timestampType look like can not distinguish mills and micros, if we direct conver structType to avroType, something data quality will happpend.

@YannByron @yihua @wzx140 @danny0405

To Reproduce

Steps to reproduce the behavior:

1. 2. 3. 4.

Expected behavior

A clear and concise description of what you expected to happen.

Environment Description

Hudi version : 0.13.1
Spark version : 3.2.0
Hive version :
Hadoop version :
Storage (HDFS/S3/GCS..) :
Running on Docker? (yes/no) :

Additional context

Add any other context about the problem here.

Stacktrace

Add the stacktrace of the error.

apache / hudi

[SUPPORT] timestamp with logical type is timestamp-mills will cause data inconsistencies #9884