Azure / azure-event-hubs-spark

Enabling Continuous Data Processing with Apache Spark and Azure Event Hubs
Apache License 2.0
233 stars 172 forks source link

Spark streaming AvailableNow trigger terminates after first batch #656

Open seb-emmot opened 1 year ago

seb-emmot commented 1 year ago

I am trying to build a spark streaming application to ingest data from Azure Event Hubs and persist to a delta table in databricks. I'm using the AvailableNow trigger in spark streaming. This trigger should process all data from the source in batches according to https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#triggers

Bug Report:

It seems like the support for the 'AvailableNow' trigger might not be implemented?

My code:

val connectionString = ConnectionStringBuilder(namespace_str)
  .setEventHubName("myhubname")
  .build

val ehConf = EventHubsConf(connectionString)
  .setConsumerGroup("myconsumergroup")
  .setMaxEventsPerTrigger(1000)

val inStream = spark.readStream.format("eventhubs").options(ehConf.toMap).load()

val outStream = inStream.writeStream
  .outputMode("append")
  .format("delta")
  .option("checkpointLocation", checkpointLocation)
  .trigger(Trigger.AvailableNow).toTable("mytablename")

I have previously asked a question related to this on Stack Overflow (in Pyspark though) https://stackoverflow.com/questions/74025485/is-spark-streaming-availablenow-trigger-compatible-with-azure-event-hub

dilisha commented 1 year ago

Hi, I am facing the same issue. Is there any fix on this @yamin-msft @hmlam? If yes, by when will this feature be available?