Azure-Samples / streaming-at-scale

How to implement a streaming at scale solution in Azure
MIT License
233 stars 97 forks source link

eventhubs-databricks-eventhubs #28

Open tessmichi opened 5 years ago

tessmichi commented 5 years ago

How should I write the data back to eventhubs? When I read it in from the input eventhubs it's in binary format, so should i write it back to the output eventhubs in binary format as well? Tagging @jcocchi or please let me know who else I should tag too!

yorek commented 5 years ago

Data in EventHub is in binary format, but you should sent to it just in plain text (if you're sending JSON). It will stored in the binary format automatically.

algattik commented 5 years ago

You need to put the body in a column called 'body'

https://github.com/Azure/azure-event-hubs-spark/blob/master/docs/structured-streaming-eventhubs-integration.md#creating-an-eventhubs-sink-for-streaming-queries

To generate JSON from a struct: https://spark.apache.org/docs/latest/api/java/org/apache/spark/sql/functions.html#to_json-org.apache.spark.sql.Column-

mpfishe2 commented 4 years ago

I have an example of reading from Event Hub and then writing back to an Event Hub using Structured Streaming here: https://github.com/mpfishe2/az-databricks-realtime-alert-system/blob/master/Real-Time%20Alerting.ipynb. Its a simple example.