Replication/syncing of user data, stores, and permissions between instances

csuwildcat commented 7 years ago

There will be many implementations of Hubs written in different languages and backed by different data storage systems. An identity may be leveraging many different Hub implementations across different devices and systems - for example: a Java/Swift implementation on a phone, a Node implementation located in a cloud, and a .NET implementation on their gaming/media device. All of these Hubs need to sync the same identity data and handle inbound requests in a way that produces a single, consistent surface area for the identity(s) they manage.

How best to do that is an open question, which I would like to pose to the members of this group. I look forward to exploring solutions with you.

cvan commented 7 years ago

Maybe not exactly the most thorough addition to the conversation, but I've used https://www.npmjs.com/package/ipfs-log with success for IPFS.

ajunge commented 5 years ago

I think this is discussed here: https://hackmd.io/OInEIRLxQY2s48tze0E7IQ#

@csuwildcat let me know if you think we should close this issue.

OR13 commented 4 years ago

@csuwildcat you are responsible for moving this issue forward... I think we are going to get to replication, we may want to close this and open a new issue to propose a specific approach.

OR13 commented 4 years ago

@csuwildcat what action should be taken based on this issue?

If we had a way to say:

move my documents matching this query from server1/vault/1 to server2/vault2

Would that cover this case?

See also couchdb replication: https://guide.couchdb.org/draft/replication.html

agropper commented 4 years ago

I'm confused by this issue. If the access control (PDP) is in a separate layer from the store (PEP) layer then is the PDP aware of how many copies there are and where they are or not?

On Thu, May 28, 2020 at 2:29 PM Orie Steele notifications@github.com wrote:

@csuwildcat https://github.com/csuwildcat what action should be taken based on this issue?

If we had a way to say:

move my documents matching this query from server1/vault/1 to server2/vault2

Would that cover this case?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/decentralized-identity/secure-data-store/issues/21#issuecomment-635520040, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABB4YOXSMPPN5K543RCARDRT2UPJANCNFSM4MPQ6GSA .

csuwildcat commented 4 years ago

Many was to do this, but any way we do it should treat the permission/capability objects and just any other object in the datastore, which are only treated differently in the context of logical evaluation.

OR13 commented 4 years ago

@dmitrizagidulin ^ let us know if this covers your concern.

agropper commented 4 years ago

The word "hubs" does not appear anywhere in SDS spec outside of a non-normative Appendix A. We're having a hard enough time understanding the relationships in the Ecosystem Overview as they relate to DID Core, DID service endpoints, and privacy.

Hubs also introduces an added concept of semantics beyond what we already have in Verifiable Credentials.

The Hubs appendix is currently about 1/3 of the entire SDS spec.

I propose we move Appendix A out of the core SDS spec entirely.

csuwildcat commented 4 years ago

I would oppose removal of the Hubs content from the spec, and would instead do the opposite: add a new top-level section that describes how to create a MUST-driven Hub construct that is a composition of all the options that will be defined in the generic SDS layers.

OR13 commented 4 years ago

@csuwildcat please open a PR that opens the door for hubs to be worked on in parallel in the spec, if we can't get the working group to help develop the hub APIs / spec language, its obviously going to get removed... as editors we need to do a better job of welcoming contribution to it, and the first step is to get it out of the appendix.

dmitrizagidulin commented 4 years ago

A discussion with @csuwildcat has identified the following starting set of basic use cases for replication. I'd like to make a PR / offer these to the group for consideration and discussion, except I'm not quite sure where to put it in the repo. Would Section 1.4 of the Use Cases doc be the appropriate place? Or should it be Section 1.4.4 - Versioning and Replication of the spec? Would appreciate advice.

Replication Concepts

(Should these go in the "Terminology" section of the spec?)

instance - here, we use the spec definition of instance, to mean the full stack of Hub + Encrypted Data Vault. Basically, it is not yet clear whether replication belongs on the EDV layer or the Hub layer, so we're using the generic 'instance' to mean either.

replication - the process of copying the contents of an instance (stored encrypted objects and indexes) to another instance.

synchronization - replication plus conflict resolution.

unidirectional vs bidirectional replication - A configuration property of the replication setup between two instances. With bidirectional replication, two instances synchronize all changes to either of their contents. With unidirectional (one-way) replication, only changes to one instance (the source) get propagated to the other (the target), but not vice versa.

realtime vs full-sync replication. With realtime replication, as soon as changes are made to an object on one instance, they are propagated "immediately" (within limitations of the connection) to the other instance. Full-sync replication is complementary, and is usually required when either of the two instances have been offline, or could not connect to each other for whatever reason. With full-sync replication, instances compare their contents to see what changes have occurred while they were disconnected, and then replicate all of those changes to each other.

subset or filtered replication - the ability of only a subset of a source instance's contents to be replicated to the target instance, based on some filter or criteria. For example, a rule that says "Only replicate Encrypted Documents with the index tag X" would be filtered replication.

Replication Use Cases

Backup / Disaster Recovery

Replication for purposes of data backup
Either involves periodic full-sync replications, or a combination of realtime replication and full-syncs.
Typically involves unidirectional replication (from an "active" source instance to the target backup instance).

Alice sets up replication to a backup instance so that she does not lose her data in case her primary instance is damaged/lost etc.

Continuous Sync / Sync on Reconnect

A combination of full-sync and realtime replication
Bidirectional replication among two or more instances
Updates to objects can occur at any instance, and are expected to propagate to all the other ones.

Alice has multiple instances, and would like to synchronize their contents among all of them. When her local instance is offline, Alice wants to still be able to make updates to it. When that instance reconnects, it is expected to hand off all of the changes that occurred while it was disconnected (and receive changes from the other instances). Many different situations:

Airplane / any other vehicle
Sleeping with the phone off
Cloud vendor experiences technical difficulties
Power outage at home

Geographic Availability

Primarily realtime replication (with fallback to full-sync if connection is ever severed)
Bidirectional replication among one or more active instances

An organization has two geographically distributed instances. Updates to one instance are expected to propagate as soon as possible to the other instance. If one of those goes down, read and write traffic can be redirected to the remaining one, with minimal interruption in data availability.

dmitrizagidulin commented 3 years ago

Addressed by PRs #94 and #95.

decentralized-identity / confidential-storage