exasol / kafka-connector-extension

Exasol Kafka Extension for accessing Apache Kafka
MIT License
4 stars 7 forks source link

Import values from two level nested Avro records #43

Closed morazow closed 1 year ago

morazow commented 3 years ago

Situation

Sometimes Avro records are encoded using top level record, for example payload, with all the data fields. Currently, these records are imported into a single table column as JSON strings.

This is usual Avro record in Kafka (in JSON representation)

{
  "type": "record",
  "name": "KafkaExasolAvroRecord",
  "fields": [
    { "name": "product", "type": "string" },
    { "name": "price", "type": { "type": "bytes", "precision": 4, "scale": 2, "logicalType": "decimal" }}
    { "name": "sale_time", "type": { "type": "long", "logicalType": "timestamp-millis" }}
  ]
}

which could map to this Exasol table:

CREATE OR REPLACE TABLE <schema_name>.<table_name> (
    PRODUCT     VARCHAR(500),
    PRICE       DECIMAL(4, 2),
    SALE_TIME   TIMESTAMP,

    KAFKA_PARTITION DECIMAL(18, 0),
    KAFKA_OFFSET DECIMAL(36, 0)
);

the last two columns are only for metadata keeping.

But unfortunately, sometimes data is stored with one more top level field, as payload.

{
   "type":"record",
   "name":"KafkaExasolAvroRecord",
   "fields":[
      {  "name":"payload",
         "type":{
            "name": "PayloadRecord",
            "type": "record",
            "fields": [
              { "name":"product", "type":"string" },
              { "name":"price", "type":{ "type":"bytes", "precision":4, "scale":2, "logicalType":"decimal" }}
              { "name":"sale_time", "type":{ "type":"long", "logicalType":"timestamp-millis" }}
            ]
         }
      }
   ]
}

This will map to single column in an Exasol table and imported into VARCHAR column as JSON string. It is still possible to use Exasol JSON functions to extract fields. However, it can cause issues if JSON string exceeds 2M characters which limit of Exasol VARCHAR type.

It would be nice feature to be able to specify the nested fields to be imported as separate columns.

Acceptance Criteria

redcatbear commented 1 year ago

Shelving because not in roadmap focus.