Closed YijunXieMS closed 4 years ago
Pre-assigning to @johanste for initial review. He will determine if a full board review is needed.
One architectural cross-cutting question; under what circumstances (if any) do we call the event handler with an empty list of events.
This should be a quick review (hopefully)
Notes from Arch Board: https://msit.microsoftstream.com/video/4dfba91e-4092-42a3-acfd-3db666af8eef
Java Review
receiveBatch
convenience method.updateCheckpoint()
automatically called? It's because the app may not have finished processing the events. (async)Python Review:
receive_batch
(receive_in_batches perhaps) - not a better name.enable_callback_when_no_event
Can't we use max_wait_time
of None?max_wait_time
appropriate name? Maybe max_processing_delay
?A very important reason why the client should never checkpoint automatically: some customers don’t use checkpoints. At all. When they open receivers, they start at LATEST, or some fixed time before NOW, and go from there. It took us a while to learn that lesson, and I want to be sure it doesn’t get lost in the transition to track 2.
The semantics of the heartbeat were simple in track 1 because track 1 did not have batching. EventProcessorHost effectively has a receive loop in it, and the track 1 receive call returns as soon as at least one message is available. EPH takes whatever the receive call returned and passes it to the user’s callback, then goes back to the top of the loop. If the receive call times out, it returns null, and the loop either translates that into an empty message list for the callback (if heartbeat is on) or skips the callback (if heartbeat is off). So with a receive timeout of one minute, if heartbeat is on, the result is straightforward: the maximum time between one call of the callback returning and the next starting is one minute. Reproducing that semantic for the single-event callback should be simple because the semantics of receiving one event at a time are about the same as the track 1 receive. Reproducing it for the batch callback is going to be complicated.
Final design for Python:
receive_batch(
on_event_batch, # type: Callable[PartitionContext, List[EventData]]
max_batch_size=300, # type: int
max_wait_time=None, # type: Optional[int]
**kwargs
)
Semantics are as follows:
max_wait_time
is set to None
, in which case we will wait indefinitely for events to arrive without calling the callback. When one or more events arrive we will call the callback immediately without waiting to reach to maximum batch size. This behaviour will also apply if max_wait_time
is set to 0. The priority here is performance, and low message latency.max_wait_time
value is set, the processor will wait at most the specified number of seconds to either fill the message batch to the maximum size, or call the callback regardless of how many messages we have. If we have received no messages in this interval, the callback will be called with an empty list. This setting provides users with both a regular heartbeat/checkpointing mechanism as well as the ability to prioritize full batches over low message latency.max_wait_time
parameter will also be added to the single event receive, with the same behaviour and same default.Unsupported scenarios:
Final API for Java:
EventProcessorClientBuilder.java will have the following APIs added:
// Single event receive with hearbeat
public EventProcessorClientBuilder processEvent(Consumer<EventContext> processEvent,
Duration maxWaitTime) {}
// Batch receive
public EventProcessorClientBuilder processEventBatch(Consumer<List<EventContext>> processEventBatch,
int maxBatchSize) {}
public EventProcessorClientBuilder processEventBatch(Consumer<List<EventContext>> processEventBatch,
int maxBatchSize, Duration maxWaitTime) {}
The semantics will be the same as Python.
cc: @JonathanGiles
These proposed changes LGTM! :shipit:
+1 - :shipit:
Looks good!
The Basics
About this client library
Artifacts required (per language)
Python
Java
Champion Scenarios
A champion scenario is a use case that the consumer of the client library is commonly expected to perform. Champion scenarios are used to ensure the developer experience is exemplary for the common cases. You need to show the entire code sample (including error handling, as an example) for the champion scenarios.
max_batch_size
andmax_wait_time
.