Open seb-emmot opened 1 year ago
I am trying to build a spark streaming application to ingest data from Azure Event Hubs and persist to a delta table in databricks. I'm using the AvailableNow trigger in spark streaming. This trigger should process all data from the source in batches according to https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#triggers
Bug Report:
It seems like the support for the 'AvailableNow' trigger might not be implemented?
My code:
val connectionString = ConnectionStringBuilder(namespace_str) .setEventHubName("myhubname") .build val ehConf = EventHubsConf(connectionString) .setConsumerGroup("myconsumergroup") .setMaxEventsPerTrigger(1000) val inStream = spark.readStream.format("eventhubs").options(ehConf.toMap).load() val outStream = inStream.writeStream .outputMode("append") .format("delta") .option("checkpointLocation", checkpointLocation) .trigger(Trigger.AvailableNow).toTable("mytablename")
I have previously asked a question related to this on Stack Overflow (in Pyspark though) https://stackoverflow.com/questions/74025485/is-spark-streaming-availablenow-trigger-compatible-with-azure-event-hub
Hi, I am facing the same issue. Is there any fix on this @yamin-msft @hmlam? If yes, by when will this feature be available?
I am trying to build a spark streaming application to ingest data from Azure Event Hubs and persist to a delta table in databricks. I'm using the AvailableNow trigger in spark streaming. This trigger should process all data from the source in batches according to https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#triggers
Bug Report:
It seems like the support for the 'AvailableNow' trigger might not be implemented?
My code:
I have previously asked a question related to this on Stack Overflow (in Pyspark though) https://stackoverflow.com/questions/74025485/is-spark-streaming-availablenow-trigger-compatible-with-azure-event-hub