Open manpreet1992 opened 3 years ago
Hi @manpreet1992 , from the example data you shared, your keys are not JSON integers but rather JSON objects containing an integer field. This means your stream must be declared as
CREATE STREAM s99 (my_key STRUCT<c4 INTEGER> KEY, c1 VARCHAR, c2 INTEGER) WITH (kafka_topic='new', format='json', PARTITIONS=4, REPLICAS=3);
instead. This should resolve the deserialization issues with your keys.
I'm going to mark this as closed for now. Feel free to reopen if your issue is still not resolved.
I would like to re-open this issue as it is something I've often encountered with the ksql-test-runner
. My typical use case is to generate test data from an avro topic as described here.
I will present the following example
ksql> show streams;
Stream Name | Kafka Topic | Key Format | Value Format | Windowed
-------------------------------------------------------------------------------------------------------
ADMISSIONS_ID_CHECKS_USERS | admissions.public.id_checks_users | AVRO | AVRO | false
-------------------------------------------------------------------------------------------------------
ksql> describe ADMISSIONS_ID_CHECKS_USERS;
Name : ADMISSIONS_ID_CHECKS_USERS
Field | Type
-------------------------------------------------
ROWKEY | BIGINT (key)
ID | BIGINT
ID_CHECK_ID | BIGINT
USER_ID | BIGINT
-------------------------------------------------
This represents my schema, thus by executing a dump as above I come up with the following input.json
kcat -C -e -J -E \
-b "$CC_BOOTSTRAP_SERVERS" \
-t admissions.public.id_checks_users \
-X security.protocol=SASL_SSL \
-X sasl.mechanisms=PLAIN \
-X sasl.username="$CC_API_KEY" \
-X sasl.password="$CC_API_SECRET" \
-s avro \
-r "$SCHEMA_REGISTRY" | \
jq --slurp \
"{inputs:[.[] | select(.key == $ID_CHECKS_USERS_ID) | {topic: .topic, timestamp: .ts, key: .key, value: .payload}]}" \
> $PWD/input.json
And the corresponding input.json
{
"inputs": [
{
"topic": "admissions.public.id_checks_users",
"timestamp": 1646643836312,
"key": 82549,
"value": {
"id": 82549,
"id_check_id": {
"long": 381
},
"user_id": {
"long": 369290
}
}
}
]
}
Since the underlying avro schema allows null values (all ksqldb fields are nullable to my knowledge), this represents the normal json representation. Anyhow when creating the statements.sql
and preparing the output.json
, all events get ignored due to the following exception.
CREATE
STREAM admissions_id_checks_users
(
id BIGINT KEY,
id_check_id BIGINT,
user_id BIGINT
) WITH (KAFKA_TOPIC = 'admissions.public.id_checks_users', KEY_FORMAT='KAFKA', VALUE_FORMAT = 'JSON');
CREATE
STREAM admissions_v2_id_checks_users
WITH (KAFKA_TOPIC='admissions-v2.id_checks_users', KEY_FORMAT='KAFKA', VALUE_FORMAT='JSON') AS
SELECT *
FROM admissions_id_checks_users EMIT CHANGES;
Above is the statements.sql
I'm trying to test, and below is the exception.
org.apache.kafka.common.errors.SerializationException: Failed to deserialize value from topic: admissions.public.id_checks_users. Can't convert type. sourceType: ObjectNode, requiredType: BIGINT, path: $.ID_CHECK_ID
at io.confluent.ksql.serde.json.KsqlJsonDeserializer.deserialize(KsqlJsonDeserializer.java:145)
at io.confluent.ksql.serde.connect.ConnectFormat$StructToListDeserializer.deserialize(ConnectFormat.java:234)
at io.confluent.ksql.serde.connect.ConnectFormat$StructToListDeserializer.deserialize(ConnectFormat.java:213)
at io.confluent.ksql.serde.GenericDeserializer.deserialize(GenericDeserializer.java:59)
at io.confluent.ksql.logging.processing.LoggingDeserializer.tryDeserialize(LoggingDeserializer.java:61)
at io.confluent.ksql.logging.processing.LoggingDeserializer.deserialize(LoggingDeserializer.java:48)
at org.apache.kafka.common.serialization.Deserializer.deserialize(Deserializer.java:60)
at io.confluent.ksql.serde.tracked.TrackedDeserializer.deserialize(TrackedDeserializer.java:53)
at org.apache.kafka.streams.processor.internals.SourceNode.deserializeValue(SourceNode.java:58)
at org.apache.kafka.streams.processor.internals.RecordDeserializer.deserialize(RecordDeserializer.java:66)
at org.apache.kafka.streams.processor.internals.RecordQueue.updateHead(RecordQueue.java:176)
at org.apache.kafka.streams.processor.internals.RecordQueue.addRawRecords(RecordQueue.java:112)
at org.apache.kafka.streams.processor.internals.PartitionGroup.addRawRecords(PartitionGroup.java:304)
at org.apache.kafka.streams.processor.internals.StreamTask.addRecords(StreamTask.java:960)
at org.apache.kafka.streams.TopologyTestDriver.enqueueTaskRecord(TopologyTestDriver.java:568)
at org.apache.kafka.streams.TopologyTestDriver.pipeRecord(TopologyTestDriver.java:552)
at org.apache.kafka.streams.TopologyTestDriver.pipeRecord(TopologyTestDriver.java:842)
at org.apache.kafka.streams.TestInputTopic.pipeInput(TestInputTopic.java:115)
at org.apache.kafka.streams.TestInputTopic.pipeInput(TestInputTopic.java:163)
at io.confluent.ksql.test.tools.TestExecutor.processSingleRecord(TestExecutor.java:502)
at io.confluent.ksql.test.tools.TestExecutor.pipeRecordsFromProvidedInput(TestExecutor.java:475)
at io.confluent.ksql.test.tools.TestExecutor.buildAndExecuteQuery(TestExecutor.java:194)
at io.confluent.ksql.test.tools.KsqlTestingTool.executeTestCase(KsqlTestingTool.java:141)
at io.confluent.ksql.test.tools.KsqlTestingTool.runWithTripleFiles(KsqlTestingTool.java:131)
at io.confluent.ksql.test.tools.KsqlTestingTool.main(KsqlTestingTool.java:56)
Caused by: io.confluent.ksql.serde.json.KsqlJsonDeserializer$CoercionException: Can't convert type. sourceType: ObjectNode, requiredType: BIGINT, path: $.ID_CHECK_ID
at io.confluent.ksql.serde.json.KsqlJsonDeserializer.enforceFieldType(KsqlJsonDeserializer.java:169)
at io.confluent.ksql.serde.json.KsqlJsonDeserializer.deserialize(KsqlJsonDeserializer.java:128)
... 24 more
I don't really believe that the proper course of action is to change all my test schemas to use STRUCT<long BIGINT>
, making the test runner pretty unusable from a conformity standpoint. Of course I can mangle my data until it passes or just perform inserts manually but I believe the type system when testing JSON (as avro to my knowledge can't be used with the test runner) should be more lenient. Any suggestions would be welcome.
Sure, we can reopen this issue.
Describe the bug We were trying creating a json stream with one column from kafka messages's key and other from values and stream creation was successful. But when we executed "select *" on the stream, the response was empty and we found serialization exception in ksql logs.
To Reproduce Steps to reproduce the behavior, include:
CREATE STREAM s99 (c4 INTEGER KEY, c1 VARCHAR, c2 INTEGER) WITH (kafka_topic='new', format='json', PARTITIONS=4, REPLICAS=3);
SET 'auto.offset.reset'='earliest'; SELECT * FROM s99 EMIT CHANGES;
Expected behavior
Actual behaviour
A clear and concise description of what actually happens, including:
Additional context Add any other context about the problem here. @apurvam