confluentinc / kafka-rest

Confluent REST Proxy for Kafka
https://docs.confluent.io/current/kafka-rest/docs/index.html
Other
38 stars 642 forks source link

Support string keys with Avro values #210

Open criccomini opened 8 years ago

criccomini commented 8 years ago

We currently use Avro for the values of our messages. It appears that we're being forced to use Avro for the key as well, since the REST proxy seems to tie the two together. We would much prefer to have our keys just be basic strings. I want to:

  1. Confirm that this currently isn't possible.
  2. Propose that it be supported, unless someone has an argument against it.
ewencp commented 8 years ago

@criccomini Correct, not supported right now. It definitely complicates things quite a bit in the implementation. Content-Type becomes confusing as it is now mixed between Avro and something else (and if its strings, ints, whatever, it's not even one of our existing supported types). We'd have to factor serialization out of the producers/consumers or we'd end up with combinatoric explosion in the number of producer instances we'd need to do arbitrary mix & match of types.

I totally get the request, but I think especially the Content-Type issues have a lot of details that need to be worked out to make this practical.

criccomini commented 8 years ago

Yea, I agree. The only thing I could come up with was some sort of string+avro Content-Type thing as well.

ewencp commented 8 years ago

Right, and given how routing/content negotiation works and deserializing is automatic via Jackson and tied to the message type, I'm not sure how this would work. Routing/content negotiation we could probably just do manually; the Jackson deserialization seems like the most difficult to resolve since it'll happen via Jersey invoking the MessageBodyProvider before kafka-rest app code is ever executed.

criccomini commented 8 years ago

I'm undecided if it's hacky to focus just on string key support, vs. arbitrary different key/value serdes.

ewencp commented 8 years ago

Yeah, I think arbitrary combos is unlikely to be useful in practice. If you're trying to use JSON keys and Avro values, you have bigger problems. Once you start with string though, int/long also makes sense as well as that will be a common key type.

criccomini commented 8 years ago

I agree. If that's the case, I wonder if some different approach might be palatable just for the string-key use case.

criccomini commented 8 years ago

(e.g. URL param, query string param, X-header (oh god no), etc)

ewencp commented 8 years ago

@criccomini Query param doesn't seem awful. But making a bunch of them for different types doesn't seem ideal. (And of course that doesn't really affect the fact that we need to reorganize where serialization is happening to make any of this work.)

Here's another idea for how to accomplish this: move it into the serializers instead, specifically the Avro ones. Basically an opt out for primitive types such that they get serialized directly without the magic byte + schema ID (and therefore also opting out of any schema registry integration/compatibility checking) and just get the raw serialized form. Which also means somehow knowing the type and configuring it for the deserializer. Serializers can tell if they are being used for a key, so this could also be restricted only to work for keys. I think the main drawback is that we'd effectively be configuring it for the entire proxy instead of per-request. So it'd be a site-wide agreement that primitive keys are included "bare".

To be honest, I understand why people want this and it can be convenient, but I'm not sure its a good idea to enable people to do this. It leaves you literally zero options for changing or adding formats for keys. Including the framing is really important for extensibility/compatibility. I understand that in a lot of cases you can reasonably assume the format of the key is fixed forever (or you're willing to pay the cost of figuring out the migration to multiple topics so you can add a new format in the new topic), but that isn't always the case and I'd even say that developer foresight wrt this issue isn't particularly good. I'd much prefer encouraging use of a format that gives you the ability to make changes and address any usability issues there if at all possible.

blootsvoets commented 7 years ago

As an example of using the primitive Avro types, use request data: {"key_schema": "\"string\"", "value_schema_id": 12345, "records": [{"key": "mykey", "value": {...}}]

Although it increases the overhead of the REST proxy slightly (a possible extra round to the schema registry), the query is still very simple to create.

joewood commented 7 years ago

A possible (more RESTful) approach would be to use multipart HTTP response. In this case, two parts - one for key and one for payload.

ValentinTrinque commented 6 years ago

Any news about this ?

PuszekSE commented 5 years ago

I'm guessing there's still no updates on that one, right?

peoplemerge commented 5 years ago

+1

cornercoding commented 5 years ago

+1

cecchisandrone commented 5 years ago

+1

plinioj commented 5 years ago

+1

mente commented 4 years ago

FYI it's not possible to use KSQL with kafka rest proxy then. KSQL doesn't support avro key format, rest proxy doesn't support non-avro key format. Check and mate.

Are there any workarounds possible? Binary key format, asking KSQL to speed up with Avro...

UPDATE

Workaround that worked for us is using value_format=json in KSQL and tell rest proxy to use binary format and then base64 decode/json decode in application. Maybe can help someone.

makarova commented 4 years ago

+1

brandonwittwer commented 4 years ago

+1 damnit

ghost commented 4 years ago

+1

apohrebniak commented 4 years ago

+1

aaugusta commented 4 years ago

Yeah, I think arbitrary combos is unlikely to be useful in practice.

Hard disagree on this. If you use an Avro key in a persistent state store and you upgrade the schema on that key, then all the data in your store will effectively disappear. The schema ID is serialized in the initial bytes of your message. String keys are an effective strategy to avoid these headaches. Also, Avro keys can’t be used with range() operations on stores.

clande commented 4 years ago

+1 I read through this. The reasons for not modifying the endpoint make sense from an architectural purity point of view. But there's a lot of people asking for this with solid reasons for needing. Seems like something a little more pragmatic is in order.

How about creating another endpoint that only supports an unencoded key. The payload would be encoded per the header used in the normal topics endpoint.

dainiusjocas commented 3 years ago

+1 Being able to specify different key and value formats would really be useful. Just like with Kafka Connect where you can specify key.converter and value.converter https://docs.confluent.io/current/schema-registry/connect.html

mnowaczyk commented 3 years ago

This is completely absurd. You're in fact shipping two incompatible systems (KSQL and this). Who on earth does something like that?

rigelbm commented 3 years ago

FYI: Support for different key and value format has been merged at https://github.com/confluentinc/kafka-rest/pull/797.

slominskir commented 3 years ago

@rigelbm - Is there any documentation on the new features? Is it included in the newest release? Specifically, how do we consume from a topic that has a String key and AVRO value? Didn't find instructions in latest docs here: https://docs.confluent.io/platform/current/kafka-rest/api.html

Hubbitus commented 2 years ago

And what about consuming? If I understand correctly, PR only addresses V3 API for the messages producing.

How I can, for example, read key in string format and value in avro?

PuszekSE commented 2 years ago

FYI: Support for different key and value format has been merged at #797.

I've looked into the new API (https://docs.confluent.io/platform/current/kafka-rest/api.html#records-v3) and it seems that specifically string as a key format is still not supported directly?...

Embedded formats: json, binary, avro, protobuf and jsonschema If data is provided as a string, it's treated as BASE64-representation of binary data.

If I have missed something, I'd really appreciate link to the respective docs :)...

raphaelauv commented 2 years ago

it work with the v3 records endpoint

the avro schema of the key :

{ "type": "string" }
import requests

headers = {
    'Content-Type': 'application/json',
}

data = {
    "key": {
        "data": "AAAAAA"
    }
}

response = requests.post(
    f"{rest_proxy}/v3/clusters/toto/topics/tata/records",
    headers=headers,
    json=data)

print(response.reason)
print(response.text)

give

OK
{"cluster_id":"toto","topic_name":"tata","partition_id":1,"offset":4,"timestamp":"2022-06-19T17:32:10.168Z","key":{"type":"AVRO","subject":"tata-key","schema_id":1,"schema_version":2,"size":12}}
NorDroN commented 1 year ago

Any update of this?

bcappoen commented 1 year ago

Hi everyone, is there any update of this topic ? Currently, i'm blocked to consume from a topic that has a String key and AVRO value. Thanks.

joseboretto commented 4 months ago

Working example of AVRO value with Primitive Avro key.

curl --location --request POST 'https://my-kafka-rest/v3/clusters/58OPpatjQHOg2UOBthv36Q/topics/myTopicName/records' \
--header 'Content-Type: application/json' \
--data '{
    "key": {
        "data": "IT_B2B_75938617"
    },
    "value": {
        "schema_id": 1318,
        "data": {
            "platform": "IT",
            "market": "B2C",
            "modelId": {
                "int": 87085186
            },
            "mmId": {
                "long": 577413910573
            },
            "offers": [
                {
                    "omOfferId": {
                        "string": "2283bf46-b000-4686-8b6c-c83642868723"
                    },
                    "productId": {
                        "long": 131117308
                    },
                    "isBestOffer": true,
                    "score": 1.35919
                }
            ]
        }
    }
}'

This format solves the problem: "Bad Request: Expected start-union. Got VALUE_NUMBER_INT" https://stackoverflow.com/questions/27485580/how-to-fix-expected-start-union-got-value-number-int-when-converting-json-to-av

Docs: https://docs.confluent.io/platform/current/kafka-rest/api.html#records-v3