Azure / azure-event-hubs-node

Node client library for Azure Event Hubs https://azure.microsoft.com/services/event-hubs
MIT License
50 stars 45 forks source link

Reading captured events from blob storage #103

Closed davidwmartines closed 6 years ago

davidwmartines commented 6 years ago

Is your feature request related to a problem? Please describe. A consuming app may need to consume all events that have ever been sent to a hub. Events older than the max retention period (e.g. 7 days) are no longer available, and can be accessed only from the capture store (Blob or DataLake), assuming capture was enabled.

Describe the solution you'd like An EPH-like consumer that reads from capture storage. Or possibly the option for EPH to read events from capture store, and switch over to hub once all events from capture store are read.

This may need to be a separate library altogether.

Describe alternatives you've considered I have implemented a consumer which can read events from blob storage, decode the avro files, and stream out the events in the same manner as an EPH-based consumer. I can track the offset that is stored in the captured events, and when the captured events have all been read I can start an EPH consumer and use the initialOffset to resume reading events from where the capture left off. However, for robustness, the checkpointing capability is needed by the capture consumer to handle errors and restarts. Also need the consumer group "lease management" so as to handle concurrent partition consuming.

Would the Lease Management strategy used by EPH work for capture storage as well? Could the LeaseManager from the library be used externally in a capture consumer implementation? Would it make sense to extend EPH to have this capability to read from capture sources in addition to the amqp/event hub source?

Additional context See Reviving historical data: combine batch & streaming approaches from http://pyrostore.io/blog/2017/06/07/challenges-of-record-replay-in-event-streaming-architectures.html (bottom of page).

ShubhaVijayasarathy commented 6 years ago

You can refer to the sample here - https://github.com/djrosanova/CaptureProcessor

davidwmartines commented 6 years ago

Thanks for the link @ShubhaVijayasarathy. I've already implemented something like that, with capability to read capture files from blob and decode avro. The problem is with checkpointing and consumer lease management. After thinking more on the problem, I am going close this issue since I'm sure it's out of scope for this library.