ClickHouse / clickhouse-kafka-connect

ClickHouse Kafka Connector
Apache License 2.0
141 stars 39 forks source link

DateTime64(9) are inserted incorrectly (near unix epoch) when using KafkaConnect TimestampConverter to convert from string to org.apache.kafka.connect.data.Time #406

Closed greenlion closed 4 days ago

greenlion commented 1 month ago

I want to encode the following document as protobuf with schema and insert it into the database through kafka:

{ "Date":"2024-04-02 03:59:00.072229523","EventType":"QUOTE BID","Ticker":"AAPL","Price":0.00,"Quantity":0,"Exchange":"EDGX","Conditions":"00000008" }

Note that the Date field is a DateTime64(9) field encoded as a string.

The database schema is as follows: CREATE TABLE kafka_test.kafka_test ( DateDateTime64(9), EventTypeLowCardinality(String), TickerLowCardinality(String), PriceFloat64, QuantityInt32, ExchangeLowCardinality(String), ConditionsString ) ENGINE = MergeTree ORDER BY (Ticker, Date, EventType, Exchange) SETTINGS index_granularity = 8192

In order to convert from the string representation to org.apache.kafka.connect.data.Time I use a TimestampConverter in the config for the sink: ` { "name": "clickhouse-connect", "config": { "connector.class": "com.clickhouse.kafka.connect.ClickHouseSinkConnector", "tasks.max": "1", "database": "kafka_test", "errors.retry.timeout": "60", "exactlyOnce": "false", "hostname": "10.249.176.81", "port": "8123", "ssl": "false", "username": "default", "password": "", "topics": "kafka_test", "transforms": "TimestampConverter", "transforms.TimestampConverter.target.type": "Timestamp", "transforms.TimestampConverter.field": "Date", "transforms.TimestampConverter.format": "yyyy-MM-dd HH:mm:ss.SSSSSSSSS", "transforms.TimestampConverter.type": "org.apache.kafka.connect.transforms.TimestampConverter$Value", "value.converter": "io.confluent.connect.protobuf.ProtobufConverter", "value.converter.schemas.enable": "true", "value.converter.schema.registry.url": "http://schema-registry:8081", "clickhouseSettings": "" } } I insert the protobuf documents into kafka with schema like so: cat /data/AAPL.json|kafka-protobuf-console-producer --bootstrap-server broker:9092 --property schema.registry.url=http://schema-registry:8081 --topic kafka_test --property value.schema='syntax = "proto3"; message TAQRecord { string Date=1; string EventType=2; string Ticker=3; double Price=4; int32 Quantity=5; string Exchange=6; string Conditions=7; }'

Steps to reproduce

  1. Start the connector with the given configuration (update the hostname and credentials)
  2. Insert the sample document into kafka encoded with protobuf and the given schema
  3. Examine the record in the kakfa_test table to see the date is wrong

The record in ClickHouse is munged. The date is wrong: Row 1: ────── Date: 1969-12-31 19:28:32.031040032 EventType: QUOTE BID Ticker: AAPL ...

Expected behaviour

Dates are inserted correctly.

greenlion commented 2 weeks ago

Any updates on this? Am I doing something wrong with the connector config or is there a bug in the connector around timestamp handling? We are currently blocked on this.

Paultagoras commented 2 weeks ago

Any updates on this? Am I doing something wrong with the connector config or is there a bug in the connector around timestamp handling? We are currently blocked on this.

Hi! I'll take a look at this today and see if we can resolve it for you folks.

Paultagoras commented 2 weeks ago

@greenlion I'm definitely able to reproduce this, just tracking down the how to resolve it for you folks

Paultagoras commented 2 weeks ago

@greenlion Alright so using a UNIX timestamp, I see that same behavior when I set the precision higher than 3 (it seems that's the default for DateTime64) - while I'm looking at other options, do you folks need that higher level precision? (3=> milliseconds, 6 => Microseconds, 9=> Nanoseconds)

greenlion commented 2 weeks ago

We need nanosecond resolution (9) for our timestamps as they represent stock trades with a precise timestamp.

On Mon, Jul 8, 2024, 2:00 AM Paultagoras @.***> wrote:

@greenlion https://github.com/greenlion Alright so using a UNIX timestamp, I see that same behavior when I set the precision higher than 3 (it seems that's the default for DateTime64) - while I'm looking at other options, do you folks need that higher level precision? (3=> milliseconds, 6 => Microseconds, 9=> Nanoseconds)

— Reply to this email directly, view it on GitHub https://github.com/ClickHouse/clickhouse-kafka-connect/issues/406#issuecomment-2213092871, or unsubscribe https://github.com/notifications/unsubscribe-auth/AADRTNXINM5KS65F62X6EXDZLITJLAVCNFSM6AAAAABJR6YTQOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMJTGA4TEOBXGE . You are receiving this because you were mentioned.Message ID: @.***>

Paultagoras commented 1 week ago

@greenlion so I think this is happening because of a bug with the converter (and how Date works in Java) - I've created a PR to try and address this (by allowing you to pass the String value and set the formatter pattern), will need to have it approved before I can release an updated version though.

greenlion commented 1 week ago

Ok, thanks for the update.

On Wed, Jul 10, 2024, 2:43 AM Paultagoras @.***> wrote:

@greenlion https://github.com/greenlion so I think this is happening because of a bug with the converter (and how Date works in Java) - I've created a PR to try and address this (by allowing you to pass the String value and set the formatter pattern), will need to have it approved before I can release an updated version though.

— Reply to this email directly, view it on GitHub https://github.com/ClickHouse/clickhouse-kafka-connect/issues/406#issuecomment-2219684067, or unsubscribe https://github.com/notifications/unsubscribe-auth/AADRTNSEEOVUEGZDHMBIG33ZLTJZDAVCNFSM6AAAAABJR6YTQOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMJZGY4DIMBWG4 . You are receiving this because you were mentioned.Message ID: @.***>

Paultagoras commented 14 hours ago

@greenlion Just in case you didn't see it, we added the datetime format change to the recent release - See dateTimeFormats in https://clickhouse.com/docs/en/integrations/kafka/clickhouse-kafka-connect-sink#configuration-options for more details 🙂

Please let me know if you run into more issues!

greenlion commented 13 hours ago

Awesome, thank you so much! I tested the new option and it works great.

On Mon, Jul 22, 2024, 3:20 PM Paultagoras @.***> wrote:

@greenlion https://github.com/greenlion Just in case you didn't see it, we added the datetime format change to the recent release - See dateTimeFormats in https://clickhouse.com/docs/en/integrations/kafka/clickhouse-kafka-connect-sink#configuration-options for more details 🙂

Please let me know if you run into more issues!

— Reply to this email directly, view it on GitHub https://github.com/ClickHouse/clickhouse-kafka-connect/issues/406#issuecomment-2243651991, or unsubscribe https://github.com/notifications/unsubscribe-auth/AADRTNUEPIQT62EERPH6LQDZNVLPDAVCNFSM6AAAAABJR6YTQOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENBTGY2TCOJZGE . You are receiving this because you were mentioned.Message ID: @.***>