confluentinc / ksql

The database purpose-built for stream processing applications.
https://ksqldb.io
Other
118 stars 1.04k forks source link

TimestampExtractor exception for known good value #9978

Open georgespingos opened 1 year ago

georgespingos commented 1 year ago

Describe the bug

Trying to define a custom timestamp for a stream fails when executing a query against the said stream for ksqlDB 0.27 using a matching version for the cli. Oddly enough the exact same stream definition against the exact same data works great when executed using the 0.23 version.

To Reproduce Steps to reproduce the behavior, include:

  1. The version of KSQL: ksqlDB 0.27
  2. Sample source data: { "traceId": "oHrlxIWeSTyKXhuhovgN3w==", "spanId": "pCESb4CxjuY=", "operationName": "HTTP GET /", "startTime": "2023-06-20T15:45:42.124831Z", "duration": "0.042437s", "tags": [ { "key": "http.method", "vStr": "GET" }, { "key": "http.url", "vStr": "/jquery-3.1.1.min.js" }, { "key": "component", "vStr": "net/http" }, { "key": "http.status_code", "vType": "INT64", "vInt64": "200" }, { "key": "otel.library.name", "vStr": "go.opentelemetry.io/otel/sdk/tracer" }, { "key": "span.kind", "vStr": "server" }, { "key": "internal.span.format", "vStr": "proto" } ], "process": { "serviceName": "frontend" } }
  3. Any SQL statements you ran: CREATE STREAM STR_SPANS_MODEL (traceId VARCHAR, spanId VARCHAR, operationName VARCHAR, startTime VARCHAR, tags ARRAY<STRUCT< KEY VARCHAR, vStr VARCHAR, vINT64 BIGINT, vBool BOOLEAN, VFloat64 DOUBLE, vBinary BYTES, vType VARCHAR> >, duration VARCHAR, references ARRAY<STRUCT< traceId VARCHAR, spanId VARCHAR> >, process STRUCT < serviceName VARCHAR, tags ARRAY<STRUCT< KEY VARCHAR, vStr VARCHAR>> >, logs STRUCT < timestamp VARCHAR, fields ARRAY<STRUCT< KEY VARCHAR, vStr VARCHAR>> >) WITH ( kafka_topic='jaeger-spans', value_format='JSON', timestamp='startTime', timestamp_format='yyyy-MM-dd''T''HH:mm:ss.[SSSSSS][SSS]z' );

Expected behavior Stream is successfully created and subsequent queries against that stream return results.

Actual behaviour Stream is successfully created but any queries against that stream return the following error after the timeout value:

org.apache.kafka.streams.errors.StreamsException: Fatal user code error in TimestampExtractor callback for record ConsumerRecord(topic = jaeger-spans, partition = 3, leaderEpoch = null, offset = 31385384834, CreateTime = 1687275410316, serialized key size = 32, serialized value size = 801, headers = RecordHeaders(headers = [], isReadOnly = false), key = null, value = [ 'grdJ2XJNTuy0oc4lsIpTaQ==' | 'HGLEUfj0a1c=' | 'Kafka Receive' | '2023-06-20T15:36:49.861107400Z' | [Struct{KEY=otel.library.name,VSTR=DONAUv2}, Struct{KEY=CorrelationId,VSTR=5566210c-e4e8-421e-b45d-1610355dda63}, Struct{KEY=trading.sport,VSTR=4}, Struct{KEY=trading.fixtureId,VSTR=14270714}, Struct{KEY=kafka.topic,VSTR=tradingv2.v1.0.heartbeats.donau.request}, Struct{KEY=kafka.partition,VSTR=10}, Struct{KEY=span.kind,VSTR=consumer}, Struct{KEY=internal.span.format,VSTR=proto}] | '0.000227s' | [Struct{TRACEID=grdJ2XJNTuy0oc4lsIpTaQ==,SPANID=44EFzv8qPxs=}] | Struct{SERVICENAME=DONAUv2,TAGS=[Struct{KEY=hostname,VSTR=ATVD1WWDON126}, Struct{KEY=service.instance.id,VSTR=9ef28a34-07b9-40bf-a95c-7d5e1405e8a6}]} | null ]).

Being banging my head against this for 2 days now and I can't think of anything else besides rolling back to the previously known good version (0.23).

Cheers.

suhas-satish commented 1 year ago

@georgespingos , can you try with ksqlDB 0.29 which is the latest release version? If you decide to rollback to previously good 0.23 version, please do let us know if it still works. We keep upgrading dependency library versions all the time and its possible to have introduced small bugs despite our thorough automated testing tooling. We rely on the community to bring these to our attention.