tbak commented 9 years ago

This is a proposal for state information propagation in the interest channel to the client.

Problem to solve

As interest stream subscription is reactive, there is no way for client to know how much relevant data is available when the subscription starts. If the client is a load balancer, it is critical to wait until available pool of servers is loaded before application requests are allowed, to avoid overloading single server that happens to be first in the list. It may be also advantages to know about bigger changes in the system topology (scale up), and do system reconfiguration when all/majority of the new servers are received via the change notification stream.

Proposed solution

Generate buffering sentinels after each last know item in the change notification stream:

Figure 1. Buffer sentinels in change notification stream

As buffer sentinels are not optimal from the transport perspective, internal implementation is using batching markers, that are transmitted over the wire together with regular change notification data.This concept is depicted by the figure below:

Figure 2. Batching markers in change notification stream

There are two possible sources of batching hints:

Eureka client registry - it works as a cache of data, and when a client subscribes, the whole subscribed registry content should be delivered as a single batch
Eureka server - whenever server knows about a batch of items to be delivered (that starts with current registry content), the items shall be delineated with the batching markers

The concept is depicted in this figure:

Figure 3. Batching markers implementation

One more level of complexity is added by the way interests are handled in the transport channel, and the index registry. For efficiency purposes, to avoid sending/processing the same data if different subscribes ask for same or overlapping data, the interests are handled at an atomic level. For example:

subscriber A asks for interest {vip=vipA}, and subscriber B asks for interest {vip=vipA or vip=vipB}
internally all subscriptions are merged into list of atomic interests {vip=vipA, vip=vipB}
the batching markers are generated per atomic interest, so must be merged before delivering to the client (just like data are merged)
if subscriber B unsubscribes, all the state related to {vip=vipB} can be safely discarded
Data model

Batch hints are implemented as a new kinds of ChangeNotification:

public class ChangeNotification<T> {
    public enum Kind {Add, Delete, Modify, Buffer, BufferingSentinel}
    ...
}

Internally a derived class StreamStateNotification is used to carry additional information. It is not however visible to the client.

qiangdavidliu commented 9 years ago

For the proposed hint markers, it seems the Buffer hint is always sent for all cases described. From a behaviour point, what does the Buffer hint offer? It seems that consuming clients and/or operators only need to listen for the finishBuffer hint for an optimised buffering experience regardless of whether there are prior Buffer hints. E.g. the consumer should be able to apply a collection operator to the stream that emits a new List each time it sees a finishBuffer hint, and possibly timeout otherwise.

On the client side, it seem to make more sense that the source of the hints is only the registry, as it should be the source of truth for all data. On the server side, this is naturally the case, and on the client side the hints should be a merge of local registry hints plus server side hints if any are available. If we make sure the hints are only generated by the registry, we should then be able to merge the multiple hints emitted by atomic interests for composite forInterests so that clients only receive a single finishBatching marker (the logic would then be that a finishBuffer is emitted once all atomic FinishBuffers are received at the merge point).

NiteshKant commented 9 years ago

I think it will be useful to provide code samples on how a client will consume this API. There are a lot of complex constructs as conditional batching and non-batching modes. I would like to see how this manifests on the consumer end.

A few implementation related questions:

How are these batching hints stored? It looks to me from your description of the various scenarios of multiple subscribers to the same interest, that the hints are generated at subscription time. I would have thought that the hints will be stored along with the data & hence will be delivered to multiple subscribers as is.
Is there an attempt to be precise about the hints? I get a feeling of this when you talk about hint merging i.e. an intent not to send multiple hints and de-dup them. If it is so, then what value will it add being precise?
What is the behavior in case of server failure between sending a "Batch" hint and "Finished batching" hint? What if there is no server available and we have already sent a "Batch" hint.

I am not really convinced about the need of "Batch" hint, however, I can see what you are trying to achieve i.e. the ability for the same client to switch between batching & non-batching mode. This is something will be cleared to see in a code example whether the complexity is worth.

@qiangdavidliu

the consumer should be able to apply a collection operator to the stream that emits a new List each time it sees a finishBuffer hint, and possibly timeout otherwise.

I think there is value in having an API where batch or non-batch is not a choice that the client is to make. In this model, the client will always batch & non-batch interaction will be timeout based.

tbak commented 9 years ago

Implemented by PR #403

Netflix / eureka

2.x State information propagation in the interest channel to the client #384

Problem to solve

Proposed solution

Data model