getindata / kafka-connect-iceberg-sink

Apache License 2.0
77 stars 28 forks source link

Wrong Timestamp mapping for Iceberg sink #8

Open qooba opened 2 years ago

qooba commented 2 years ago

Summary

Currently the timestamp types (eg. in postgres) are wrongly mapped as bigint in Iceberg. The sink sould keep source type and use data_type='timestamp'

How to Reproduce

1) Create table in postgress with column datetime where date type will be eg. timestamp without time zone 2) Feed with example data 3) Debezium connector will produce event:

...
"fields": [
                    {
                        "type": "int64",
                        "optional": true,
                        "name": "io.debezium.time.MicroTimestamp",
                        "version": 1,
                        "field": "datetime"
                    },
]
...

where type is "int64" with additional annotation "io.debezium.time.MicroTimestamp" https://debezium.io/documentation/reference/stable/connectors/postgresql.html#postgresql-temporal-types 4) The Iceberg sink will ignore annotation and will create datetime column as a bigint type

describe table local.mytable_dbz.debeziumcdc_postgres_public_mystats_fv1

will give:

Row(col_name='datetime', data_type='bigint', comment='')
gliter commented 1 year ago

@qooba This PR https://github.com/getindata/kafka-connect-iceberg-sink/pull/30 might have fixed this issue as well. This change is released in 0.3.0 version. Could you retest?

omri-shaiko commented 1 year ago

Hi @gliter , I see the same problem with version 0.4.0 I have a few tables with Datetime columns on mysql, but when I see them in AWS Glue, the type is bigint with a comment "org.apache.kafka.connect.data.Timestamp".

Do i need to configure something for it to auto setup it as Date?