farmdawgnation / registryless-avro-converter

An avro converter for Kafka Connect without a Schema Registry
Apache License 2.0
53 stars 21 forks source link

Incoming Avro message doesn't match with the expected format #9

Closed fabiotc closed 5 years ago

fabiotc commented 5 years ago

Hi! Firstly thanks for this project.

We have a Kafka setup without Schema Registry, but we'd like to configure kafka-connect sinks using Avro with avsc files.

Expected Behavior

Register a connector using registryless-avro-converter as the value converter to be able to have generic records based on the schema (avsc file), and of course, the incoming data.

Actual Behavior

When producing a message:

java.io.IOException: Not a data file.
at org.apache.avro.file.DataFileStream.initialize(DataFileStream.java:105)
at org.apache.avro.file.DataFileReader.<init>(DataFileReader.java:97)
at me.frmr.kafka.connect.RegistrylessAvroConverter.toConnectData(RegistrylessAvroConverter.java:126)

That happens either using an Avro producer or a regular kafka producer. Seems like the Avro content doesn't match with the expected format and magic bytes.

Steps to Reproduce the Problem

  1. Use a regular avsc file:

    {
    "type": "record",
    "name": "sample",
    "fields": [
        {
            "name": "id",
            "type": "string"
        }
    ]
    }
  2. Register a connector, like this:

    {
    "connector.class": "io.confluent.connect.jdbc.JdbcSinkConnector",
    "tasks.max": "1",
    "topics": "sample-topic-avro",
    "type.name": "service-avro",
    "value.converter": "me.frmr.kafka.connect.RegistrylessAvroConverter",
    "value.converter.schema.path": "/path/schema.avsc",
    "connection.url": "jdbc:postgresql://postgres:5432/projections",
    "connection.user": "postgres",
    "connection.password": "secret",
    }
  3. Produce a message:

    
    ./bin/kafka-avro-console-producer \
             --broker-list kafka:9092 --topic sample-topic-avro \
             --property value.schema='{"type":"record","name":"sample","fields":[{"name":"id","type":"string"}]}'

{"id": "some-string"}



### Specifications

  - Confluent Platform: 5.3.0 
  - Java 8
farmdawgnation commented 5 years ago

Thanks for the bug report. This certainly should not be happening. We're actively using this in production without issues on our end, so I have a few things I'd like you to try for me before I dig much further.

First, can you give me a binary dump of the kafka key and value (separate files) that get produced when you run kafka-avro-console-producer?

farmdawgnation commented 5 years ago

Ah one more thing, can you confirm you're using RAC 1.8.0, which is compiled against CP 5.3.0? I would like to rule out any binary interface issues.

fabiotc commented 5 years ago

Hi @farmdawgnation, thanks for the response back! I'm using RAC 1.8.0.

But now I realized that for my test environment I have an instance of Schema Registry running, so I think the message produced by kafka-avro-console-producer is being affected by its features. So, in the end, the message received by the Converter (RAC) is just the data itself (in my previous example something like H722b81be-96cb-4be5-9863-3ff3a2dfe476) without the attached schema and those "magic" special bytes.

How does that sound to you?

farmdawgnation commented 5 years ago

Hey @fabiotc - sorry for the delay in replying. The email notification must have gotten lost in my inbox. That sounds like a plausible explanation to me. I'll close this for now. We can reopen if it appears to be a real bug. :)