AbsaOSS / ABRiS

Avro SerDe for Apache Spark structured APIs.
Apache License 2.0
229 stars 75 forks source link

supplying kafka key in toConfluentAvro #6

Closed pateusz closed 6 years ago

pateusz commented 6 years ago

Hey. Is there any way to specify kafka key when using toConfluentAvro method? As far as I see this method converts DF to DF with only 'value' column, which makes it impossible to specify kafka key from one of input cols. As far as i see the same happens in fromConfluentAvro - key columns isn't preserved. Is there any workaround for this?

felipemmelo commented 6 years ago

Hi there. For now you're right, we are currently only supporting the value column. The key will be supported in about 2 weeks. You'll come back here as soon as it's committed. Thanks for question.

felipemmelo commented 6 years ago

Hi there, pinging to let you know that the key column can now be preserved after consuming from Kafka. News about using it to send messages to Kafka are coming soon.

felipemmelo commented 6 years ago

Hi there, full support now provided for Avro serde on keys an values from Dataframes retrieved from Kafka.

https://github.com/AbsaOSS/ABRiS#writingreading-keys-and-values-as-avro-from-kafka

pateusz commented 6 years ago

Thanks. Appreciate it a lot. One more question. As far as I'm able to read nofluent avro value+plain key. Is it possible to write confluent avro record in such way?

felipemmelo commented 6 years ago

Do you mean having plain key and Avro payload in Confluent format?

pateusz commented 6 years ago

Exactly this

felipemmelo commented 6 years ago

Not yet, but since you're asking about it we now have a use case. Will add and let you know. Thanks for help.

felipemmelo commented 6 years ago

Hi @pateusz , pinging to let you know about the plain key feature. You can check more about it here: https://github.com/AbsaOSS/ABRiS#writingreading-values-as-avro-and-plain-keys-as-string-tofrom-kafka

Regards.

OneCricketeer commented 6 years ago

Rather than make new methods for each combination of key+value type (for example, say I have avro keys and integer values), what about a UDF that you can just at-will do like

df.select(from_confluent_avro(col("key")), col("value").cast("int"))

Similar to the existing from_json function

felipemmelo commented 6 years ago

Hi @cricket007 , sorry for the late reply. Yep, that is definitely coming since Spark itself had these methods added. We'll be updating the whole API to comply with that standard.

OneCricketeer commented 6 years ago

@felipemmelo Would you like me to create a new issue for tracking that request?

felipemmelo commented 6 years ago

Hi @cricket007 , if you can, please, so that we keep it documented. Thanks a lot!

OneCricketeer commented 6 years ago

@felipemmelo Done! #16