Azure / azure-documentdb-changefeedprocessor-dotnet

This library provides a host for distributing change feed events in partitioned collection across multiple observers. Instances of the host can scale up (by adding) or down (by removing) dynamically, and the load will be automatically distributed among active instances in about-equal way.
Other
51 stars 22 forks source link

Question: Change ordering - will the final version of a document always be presented in the case of an update? #125

Open bartelink opened 5 years ago

bartelink commented 5 years ago

Can the ChangeFeed always be relied on to present the final version of a document in its correct sequence ?

i.e. (under the same partition key value) if I insert doc 1 (call that edition 1a), then insert doc 2 in a separate transaction, then insert doc 3 and update doc 1 in the next transaction (1b),

  1. am I guaranteed I'll see either: 1a,2,3,1b or 2,3,1b ?
  2. Can I rule out seeing 1b ... 1a ?
  3. Can I rule out seeing 1a but never 1b ?
  4. I am fine with not being guaranteed to see 1a - but can you confirm this is possible ?
  5. I assume if one was to start a new projection N hours/days later, it's highly unlikely to see 1a ?
  6. Can you confirm that one'd never see 1a presented after 2 or 3 ?

Same questions as above:- even as as Partitions split or merge? (see also #124)

jsmithtx commented 5 years ago

@bartelink For items 1-4, you will always see 1a,2,3,1b in that order, as long as the partition key is the same. Ordering is guaranteed within a partition key. Inserts and updates are preserved in the order they occur.

bartelink commented 5 years ago

@jiffypopjr AIUI you're not guaranteed to see the 1a if changes happen quickly enough, which is the reason why I asked 4. (there has been discussion about offering a feature where one is guaranteed to see it - either permanently, or within some window)

(Also, the secondary question of whether I can trust the order to remain the same even in the face of splits and merges is why I'm asking some pretty basic things - I want to know whether the implementation accommodates for and/or guarantees this is upheld, in addition to it being nice to having my basic understanding of the core changefeed aspects being correct)

gat-cs commented 4 years ago

I am also interested in getting an answer to question 4. The documentation specifies that:

Only the most recent change for a given item is included in the change log. Intermediate changes may not be available.

The first sentence reads as a guarantee that only the most recent change for a given item is included in the feed. This would imply that there cannot be more than one change for a given item in the change log, and therefore in a batch of changes (when using ChangeFeedProcessor).

The second sentence reads as an indication that it is highly likely that intermediate changes may not be included in the change feed, and therefore the availability of intermediate changes should not be relied on.

That distinction can be important for some use-cases. For example, we have a use-case where each document is independent. The order in which changes for different documents are processed is irrelevant, but the order in which changes are processed for a given document is critical. If there is a guarantee that there can only at most be one change for a given item in the change feed, we could leverage that to process changes within a batch of changes in no particular order. However, if it is not a guarantee but a very likely possibility, we need to deal with it.

bartelink commented 4 years ago

@gat-cs I've learned a lot since posting this and can confirm that, as its a continuous running query (there literally is no change log), you are definitely guaranteed to see the final state (plus potentially interim states). You can rely on observing them in the correct order.

It should also be noted that there's a future feature in the works to provide a different mechanism (can't recall where I saw that flagged); such a feature may well offer a change log-like thing, but for now I strongly advise against thinking of the changefeed as such a construct.

Leaving this open as I and others would like official answers too.