Azure / azure-sdk

This is the Azure SDK parent repository and mostly contains documentation around guidelines and policies as well as the releases for the various languages supported by the Azure SDK.
http://azure.github.io/azure-sdk
MIT License
487 stars 297 forks source link

Board Review: azure-eventhub receive_batch #985

Closed YijunXieMS closed 4 years ago

YijunXieMS commented 4 years ago

The Basics

About this client library

Artifacts required (per language)

Python

Java

Champion Scenarios

A champion scenario is a use case that the consumer of the client library is commonly expected to perform. Champion scenarios are used to ensure the developer experience is exemplary for the common cases. You need to show the entire code sample (including error handling, as an example) for the champion scenarios.

adrianhall commented 4 years ago

Pre-assigning to @johanste for initial review. He will determine if a full board review is needed.

johanste commented 4 years ago

One architectural cross-cutting question; under what circumstances (if any) do we call the event handler with an empty list of events.

johanste commented 4 years ago

This should be a quick review (hopefully)

adrianhall commented 4 years ago

Notes from Arch Board: https://msit.microsoftstream.com/video/4dfba91e-4092-42a3-acfd-3db666af8eef

Java Review

Python Review:

JamesBirdsall commented 4 years ago

A very important reason why the client should never checkpoint automatically: some customers don’t use checkpoints. At all. When they open receivers, they start at LATEST, or some fixed time before NOW, and go from there. It took us a while to learn that lesson, and I want to be sure it doesn’t get lost in the transition to track 2.

The semantics of the heartbeat were simple in track 1 because track 1 did not have batching. EventProcessorHost effectively has a receive loop in it, and the track 1 receive call returns as soon as at least one message is available. EPH takes whatever the receive call returned and passes it to the user’s callback, then goes back to the top of the loop. If the receive call times out, it returns null, and the loop either translates that into an empty message list for the callback (if heartbeat is on) or skips the callback (if heartbeat is off). So with a receive timeout of one minute, if heartbeat is on, the result is straightforward: the maximum time between one call of the callback returning and the next starting is one minute. Reproducing that semantic for the single-event callback should be simple because the semantics of receiving one event at a time are about the same as the track 1 receive. Reproducing it for the batch callback is going to be complicated.

annatisch commented 4 years ago

Final design for Python:

receive_batch(
    on_event_batch,  # type: Callable[PartitionContext, List[EventData]]
    max_batch_size=300,  # type: int
    max_wait_time=None,  # type: Optional[int]
    **kwargs
)

Semantics are as follows:

Unsupported scenarios:

srnagar commented 4 years ago

Final API for Java:

EventProcessorClientBuilder.java will have the following APIs added:

// Single event receive with hearbeat
public EventProcessorClientBuilder processEvent(Consumer<EventContext> processEvent, 
        Duration maxWaitTime) {}

// Batch receive
public EventProcessorClientBuilder processEventBatch(Consumer<List<EventContext>> processEventBatch, 
        int maxBatchSize) {}
public EventProcessorClientBuilder processEventBatch(Consumer<List<EventContext>> processEventBatch, 
        int maxBatchSize, Duration maxWaitTime) {}

The semantics will be the same as Python.

API View - https://apiview.dev/Assemblies/Review/7d5c89c868424d45bbd5319d5f5e96b8?diffRevisionId=0d96e4d2e8bc436a8ef3f0aacc49672c

cc: @JonathanGiles

bterlson commented 4 years ago

These proposed changes LGTM! :shipit:

johanste commented 4 years ago

+1 - :shipit:

KrzysztofCwalina commented 4 years ago

Looks good!