preguica commented 7 years ago

RFC 100: Public API for information on synchronization

Authors: Nuno Preguiça, Gonçalo Cabrita, Jean Araújo

Status: Under discussion (Under discussion / Rejected / Approved: assigned to X / Implemented: pull request X)

Context and Goal

Context

In AntidoteDB, updates are propagated asynchronously among DCs. A transaction accesses a snapshot of the stable updates known at the local DC (with stable meaning that atomicity of updates in a transaction and causal dependencies across transactions are satisfied).

Thus, when a transaction reads a value in a DC, it is possible some updates to that value have already been committed in some other DCs. Additionally, during the execution of a transaction, some other transaction may modify the read value in the same (or other) DC -- these updates are also not visible to the running transaction.

Goal

Provide an API to allow applications to have information about the up-to-date'ness of read data. This allows the application to reason about the staleness of read data, based on information available in the local DC.

We can define potential staleness as the time period for which the read data may be stale. This can be computed from the last synchronization with remote DCs, which establish the earliest time when an unseen update may have been issued (at some other DC).

Use Case

Synchronization status

A user-facing application may want to provide information to the user on the last time the local replica has been synchronized with the remote DCs.

Information on known changes

An application may want to prevent executing transactions on data that it knows has been modified concurrently.

API changes

External: AntidoteDB + protobuf

last_sync([bound_object()], TxId) -> {ok, [clock_time()]}. For each object, it returns the earliest time it synchronized for the last time with some DC replicating the object -- let t1 be the last time it synchronized with DC 1, the result for object o is min(ti), with i the DCs where o is replicated.
last_sync_detail([bound_object()], TxId) -> {ok, [vectorclock()]}. The same as before, but detailing the information of last synchronization with each DC.
known_stale([bound_object()], TxId) -> {ok, [boolean()]}. For each data item, it returns whether the local DC has received an update that has not been reflected in the transaction snapshot.

Internal: transaction manager

last_sync([bound_object()], TxId) -> {ok, [clock_time()]}.
last_sync_detail([bound_object()], TxId) -> {ok, [vectorclock()]}.
known_stale([bound_object()], TxId) -> {ok, [boolean()]}. As in the external API.

Internal: log

has_new_updates([bound_object(),vectorclock()]) -> {ok, [integer()]}. Returns the number of known updates not reflects in the given vectorclock.

Design

last_sync([bound_object()], TxId) -> {ok, [clock_time()]}.

The return value can be computed from the snapshot time of the transaction. It is only necessary to process the operation in the component that records the snapshot time for the transaction.

API.last_sync([bound_object()], TxId) -> 
    TM.last_sync([bound_object()], TxId). 

TM.last_sync( list, TxId) -> 
        snapshot_time = get_snapshot_time( TxId)
        return {ok, list.map(key -> 
                            replicas = get_replica_set( key)
                            min( snapshot_time.filter( replicas)) ) }

NOTE: this implementation supports partial replication.

last_sync_detail([bound_object()], TxId) -> {ok, [vectorclock()]}.

The return value can be computed from the snapshot time of the transaction. It is only necessary to process the operation in the component that records the snapshot time for the transaction.

API.last_sync_detail([bound_object()], TxId) -> 
    TM.last_sync_detail([bound_object()], TxId). 

TM.last_sync_detail( list, TxId) -> 
        snapshot_time = get_snapshot_time( TxId)
        return {ok, list.map(key -> 
                            replicas = get_replica_set( key)
                            snapshot_time.filter( replicas) ) }

known_stale([bound_object()], TxId) -> {ok, [boolean()]}.

API.known_stale([bound_object()], TxId) -> 
    TM.known_stale([bound_object()], TxId). 

TM.known_stale([bound_object()], TxId) -> 
        snapshot_time = get_snapshot_time( TxId)
        return {ok, list.map(key -> 
                            LOG.has_new_updates([key,snapshot_time] > 0) ) }

LOG.has_new_updates([bound_object(),vectorclock()]) -> {ok, [integer()]}.

Return value based on the log information.

Implementation

Straightforward from the design.

Propagate calls to clocksi_readitem_server (the same way a read is done), where the snapshot time of the transaction is known.

Prior discussion

This feature has been discussed in the past in:

Syncfree deliverable D3.2 [https://syncfree.lip6.fr/attachments/article/46/d3.2.pdf]
Deepthi Devaki Akkoorath, Viktória Fördós, Annette Bieniusa: Observing the consistency of distributed systems. Erlang Workshop 2016: 54-55.

Record of changes

04-10-2017: first complete version

bieniusa commented 7 years ago

The last_sync information is very similar to the potential staleness that we are already calculating, right? One issue that I see is that this information is divergent between different nodes. This means depending on where the respective clocksi_readitem_server is located, you might get different results. Are you planning to adapt the meta-data distribution to prevent this behaviour?

known_stale should be comparatively easy to compute; though, this will depend on the intraDC replication that we will establish.

@deepthidevaki What is your take?

deepthidevaki commented 7 years ago

last_sync if calculated from the snapshot_time, can be used to calculate the potential staleness. last_sync gives the exact time, while staleness is the difference between current time and what is being read(i.e. snapshot time). In this case, we don't need to propagate the call to clocksi_readitem_server, because transaction coordinator knows the snapshot time. Another way to get last_sync is to get it from the vector_clock of the partition which will be updated everytime when it receives a remote update. In this case, it does not reflect what is being read in the transaction, but what is available in the DC.
known_stale can be computed by materializer by checking the number of updates not included in the snapshot. Remote updates are immediately cached in the materializer, so it is up-to-date with what is in the log.

preguica commented 7 years ago

On Wed, Oct 4, 2017 at 5:23 AM, Annette Bieniusa notifications@github.com wrote:

The last_sync information is very similar to the potential staleness that we are already calculating, right?

Yes and no. You get similar information, but instead of computing the difference from current time to the synchronization time, you return the synchronization time. I think this addresses your following issue.

One issue that I see is that this information is divergent between different nodes. This means depending on where the respective clocksi_readitem_server is located, you might get different results. Are you planning to adapt the meta-data distribution to prevent this behaviour?

known_stale should be comparatively easy to compute; though, this will depend on the intraDC replication that we will establish.

I would expect that you had to go to the log (or materializer to get this information), while the previous one you could get immediately from the snapshot time of the transaction.

@deepthidevaki https://github.com/deepthidevaki What is your take?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/SyncFree/antidote/issues/318#issuecomment-334046398, or mute the thread https://github.com/notifications/unsubscribe-auth/AJZR8pDr2Yvk4TJc2YsKmSZ4izfOrPVRks5sowhbgaJpZM4PsWFb .

preguica commented 7 years ago

On Wed, Oct 4, 2017 at 9:42 AM, Nuno Preguica nuno.preguica@fct.unl.pt wrote:

On Wed, Oct 4, 2017 at 5:23 AM, Annette Bieniusa <notifications@github.com

wrote:

The last_sync information is very similar to the potential staleness that we are already calculating, right?

Yes and no. You get similar information, but instead of computing the difference from current time to the synchronization time, you return the synchronization time.

And the idea is to expose that by a public API call.

I think this addresses your following issue.

One issue that I see is that this information is divergent between different nodes. This means depending on where the respective clocksi_readitem_server is located, you might get different results. Are you planning to adapt the meta-data distribution to prevent this behaviour?

known_stale should be comparatively easy to compute; though, this will depend on the intraDC replication that we will establish.

I would expect that you had to go to the log (or materializer to get this information), while the previous one you could get immediately from the snapshot time of the transaction.

@deepthidevaki https://github.com/deepthidevaki What is your take?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/SyncFree/antidote/issues/318#issuecomment-334046398, or mute the thread https://github.com/notifications/unsubscribe-auth/AJZR8pDr2Yvk4TJc2YsKmSZ4izfOrPVRks5sowhbgaJpZM4PsWFb .

bieniusa commented 7 years ago

Dr. Annette Bieniusa University of Kaiserslautern Department of Computer Science AG Software Technology

Office: Building 32, Room 430 Phone: +49 631 205 3958 Fax: + 49 631 205 3420

On 4. Oct 2017, at 10:43, preguica notifications@github.com wrote:

On Wed, Oct 4, 2017 at 9:42 AM, Nuno Preguica nuno.preguica@fct.unl.pt wrote:

On Wed, Oct 4, 2017 at 5:23 AM, Annette Bieniusa <notifications@github.com

wrote:

The last_sync information is very similar to the potential staleness that we are already calculating, right?

Yes and no. You get similar information, but instead of computing the difference from current time to the synchronization time, you return the synchronization time.

And the idea is to expose that by a public API call.

Sure.

I think this addresses your following issue.

One issue that I see is that this information is divergent between different nodes. This means depending on where the respective clocksi_readitem_server is located, you might get different results. Are you planning to adapt the meta-data distribution to prevent this behaviour?

I seem to be missing something… The point is that the synchronisation time might appear different (in the current design) for the vnodes. This would need a fix to get always-advancing information.

known_stale should be comparatively easy to compute; though, this will depend on the intraDC replication that we will establish.

I would expect that you had to go to the log (or materializer to get this information), while the previous one you could get immediately from the snapshot time of the transaction.

Yes, but what if the materialiser is replicated?

@deepthidevaki https://github.com/deepthidevaki What is your take?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/SyncFree/antidote/issues/318#issuecomment-334046398, or mute the thread https://github.com/notifications/unsubscribe-auth/AJZR8pDr2Yvk4TJc2YsKmSZ4izfOrPVRks5sowhbgaJpZM4PsWFb .

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/SyncFree/antidote/issues/318#issuecomment-334089012, or mute the thread https://github.com/notifications/unsubscribe-auth/AAat8MTtnkKGkSXPlPhWxoe6-gmVDSl5ks5so0VBgaJpZM4PsWFb.

preguica commented 7 years ago

On Wed, Oct 4, 2017 at 9:36 AM, Deepthi Akkoorath notifications@github.com wrote:

last_sync if calculated from the snapshot_time, can be used to calculate the potential staleness. last_sync gives the exact time, while staleness is the difference between current time and what is being read(i.e. snapshot time).

Yes, that's the idea... we thought about having the staleness, as we have discussed in the apst. The problem is that returning the time of sync allows the application to compute the information locally.

In this case, we don't need to propagate the call to clocksi_readitem_server, because transaction coordinator knows the snapshot time.

OK... I was with Gonçalo trying to figure out here we needed to go to get this info and it seemed it was to that server...

Another way to get last_sync is to get it from the vector_clock of the partition which will be updated everytime when it receives a remote update. In this case, it does not reflect what is being read in the transaction, but what is available in the DC.

But here the idea is to get this for the transaction that is running, which is accessing a snapshot taken at begin time.

known_stale can be computed by materializer by checking the number of updates not included in the snapshot. Remote updates are immediately cached in the materializer, so it is up-to-date with what is in the log.

It seemed to me that this would be only true if the materializer receives all updates and this might not be the case for remote updates, but this is an interesting question: what is the inter-relation between the materializer and the inter-dc replication protocol?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/SyncFree/antidote/issues/318#issuecomment-334087311, or mute the thread https://github.com/notifications/unsubscribe-auth/AJZR8oSTGgwweyUn4Ycwn4Fhv0jcyLgyks5so0OYgaJpZM4PsWFb .

deepthidevaki commented 7 years ago

It seemed to me that this would be only true if the materializer receives all updates and this might not be the case for remote updates, but this is an interesting question: what is the inter-relation between the materializer and the inter-dc replication protocol?

Materializer receives all updates received by inter-dc protocol. Remote updates are cached in the materializer at the same time it is written to the log.

preguica commented 7 years ago

On Wed, Oct 4, 2017 at 12:49 PM, Deepthi Akkoorath <notifications@github.com

wrote:

It seemed to me that this would be only true if the materializer receives all updates and this might not be the case for remote updates, but this is an interesting question: what is the inter-relation between the materializer and the inter-dc replication protocol?

Materializer receives all updates received by inter-dc protocol. Remote updates are cached in the materializer at the same time it is written to the log.

I guess in a future implementation of the materializer, this might not be the case, right? I suggest the following: route the request by the materializer. If the materializer has the info, it replies. Otherwise, it will forward the request to the log. Does this make sense?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/SyncFree/antidote/issues/318#issuecomment-334130696, or mute the thread https://github.com/notifications/unsubscribe-auth/AJZR8ocJb4DoXOiyy_vQCCmlDi_-D5cDks5so3DWgaJpZM4PsWFb .

bieniusa commented 7 years ago

Am 06.10.2017 um 15:14 schrieb preguica notifications@github.com:

On Wed, Oct 4, 2017 at 12:49 PM, Deepthi Akkoorath <notifications@github.com

wrote:

It seemed to me that this would be only true if the materializer receives all updates and this might not be the case for remote updates, but this is an interesting question: what is the inter-relation between the materializer and the inter-dc replication protocol?

Materializer receives all updates received by inter-dc protocol. Remote updates are cached in the materializer at the same time it is written to the log.

I guess in a future implementation of the materializer, this might not be the case, right? I suggest the following: route the request by the materializer. If the materializer has the info, it replies. Otherwise, it will forward the request to the log. Does this make sense?

Did we clarify whether we have the materialiser replicated?

deepthidevaki commented 7 years ago

In the proposed design last_sync and last_sync_detail are calculated from the snapshot_time of the transaction irrespective of the list of bound_objects being passed. So bound_objects as parameters is unnecessary. Do you think, in a future implementation this will be needed, where last_sync is calculated differently for each keys irrespective of transaction?

preguica commented 7 years ago

The list of objects is there for supporting partial replication, where not all DCs replicate all objects.

On Tue, Oct 10, 2017 at 1:45 PM, Deepthi Akkoorath <notifications@github.com

wrote:

In the proposed design last_sync and last_sync_detail are calculated from the snapshot_time of the transaction irrespective of the list of bound_objects being passed. So bound_objects as parameters is unnecessary. Do you think, in a future implementation this will be needed, where last_sync is calculated differently for each keys irrespective of transaction?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/SyncFree/antidote/issues/318#issuecomment-335460512, or mute the thread https://github.com/notifications/unsubscribe-auth/AJZR8hly6EAvT5d3aPLiMbYxod-E8pj6ks5sq2cBgaJpZM4PsWFb .

AntidoteDB / antidote