dtinit / data-transfer-project

The Data Transfer Project makes it easy for platforms to build interoperable user data portability features. We are establishing a common framework, including data models and protocols, to enable direct transfer of data both into and out of participating online service providers.
https://dtinit.org/docs/dtp-what-is-it
Apache License 2.0
3.55k stars 486 forks source link

Decentralized self-owned identity and encrypted personal datastores #490

Open csuwildcat opened 5 years ago

csuwildcat commented 5 years ago

I am rather alarmed that this project was constructed in a way that entrenches centralizing federation models in the face of emerging decentralized tech/models that provide superior user-ownership of identifiers, much better data privacy/security, and data exchange approaches that democratize the application and service landscape. Can you provide some background into why there was no large-scale open discussion/evaluation period where you examined tech/models with the wider identity/app/dev community before coming out with these specific proposals?

jimmarino commented 5 years ago

Hi Daniel,

Could you point to some specific technologies or libraries you are thinking of? It would be easier to have a discussion of how the approaches you mention fit with the project if there are concrete examples.

One of the intentions of the open source project is to create a forum where proposals can be discussed and potentially adopted among a wide group of participants. Our hope is that the platform continues to evolve and is driven by community needs.

Jim

On Fri, Jul 20, 2018 at 9:17 PM Daniel Buchner notifications@github.com wrote:

I am rather alarmed that this project was constructed in a way that entrenches centralizing federation models in the face of emerging decentralized tech/models that provide superior user-ownership of identifiers, much better data privacy/security, and data exchange approaches that democratize the application and service landscape. Can you provide some background into why there was no large-scale open discussion/evaluation period where you examined tech/models with the wider identity/app/dev community before coming out with these specific proposals?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/google/data-transfer-project/issues/490, or mute the thread https://github.com/notifications/unsubscribe-auth/AAXD8XQV0IXLT0FOAKbamLDKAXVT1gxmks5uIizIgaJpZM4VY_Wc .

melvincarvalho commented 5 years ago

Hope this is not off topic. Not speaking for OP, but I'm a developer on the solid project :

https://github.com/solid/solid

We have done some work on standardizing data formats. Namely, using the W3C RDF model.

I think we share many of the goals of the DTP. Looking forward to following the progress!

akuckartz commented 5 years ago

See also #306 regarding Semantic Web and schema.org

csuwildcat commented 5 years ago

Hello Jim,

On the technology front, we've been heads down for over a year working with over 60 organizations to research, draft specs, and prepare to build an open, decentralized system of self-owned identifiers that users control apart from any provider/host, encrypted personal datastores that replicate to clouds and edge devices, and an inferential data exchange mechanism that overcomes the repetitive failure of systems based on rigid data discovery and shared-scope definition. You can read our post that encapsulated over a year of formal investigation, and marked the beginning of our build-out with a community of like minded developers and organizations: https://cloudblogs.microsoft.com/enterprisemobility/2018/02/12/decentralized-digital-identities-and-blockchain-the-future-as-we-see-it/

melvincarvalho commented 5 years ago

@csuwildcat thanks for sharing more detail. The approach looks very interesting.

Could you elaborate on the specific issue wrt DTP?

From my brief understanding of DTP, it should be possible to use did: URIs in a JSON-LD serialization (I am guessing that is what will be used). In fact it should be possible to use any type of identity URI. Please correct me if I'm mistaken there.

From what I can see the data language should be inclusive of almost every major type of identity system.

csuwildcat commented 5 years ago

@melvincarvalho I see a few different issues with DTP - here are the main ones:

  1. The system entrenches a model where identifiers - core to true personal self-ownership of legally reliable identity - are still owned by corporations and organizations, not the user. This is a massive issue that is a security, privacy, and safety concern as we progress to a more connected digital world where your digital identity becomes your 'passport' to doing critical things in the world - to have a corporation own that is, frankly, unacceptable.

  2. The data exchange model is based on creating global representations of data types to form a sort of One True Model. As I mentioned in another Issue someone created about use of schema.org (https://github.com/google/data-transfer-project/issues/306): "Pursuit of a single shared model that produces One True Model is a general anti-pattern that has failed countless times over the years, because it's a fundamentally flawed approach. With Identity Hubs, we believe we've found a better way to overcome this issue by supporting all schemas by default (including schema.org), via the concept of deterministic, inferential knowability: https://github.com/decentralized-identity/hubs/blob/master/explainer.md#collections"

  3. The data storage/sharing model I see here does not feature mechanisms of encryption/syncing/replication to cloud host and user edge devices that eliminate/dramatically reduce reliance on providers and hosts, which is concerning. Your identity datastore should exist as a provider/host agnostic storage mesh that treats the hosting location as an adversary, not a conduit to expand the "Eye in the Sky" even further.

jimmarino commented 5 years ago

Echoing @melvincarvalho I don't see any real issues with the approach you outlined. The data model used by the platform is designed to be as extensible as possible exactly because we did not want to mandate one particular approach to data structure. In the system, we use JSON-based polymorphic de/serialization to achieve this. This allows extensions to flow data through the system without requiring the core infrastructure or other extensions to be aware of its content (full streaming is also supported).

To address your specific points:

  1. I'm not sure why DTP would entrench a model where identity is owned by corporations or organizations. An instance could be setup to enable users to move their data out of a corporate realm to some decentralized infrastructure.

  2. Per my previous comments, the data model used by DTP is extensible and eschews the idea of a single global model.

  3. The data that flows through the platform is always encrypted and is only stored (in encrypted form) for the duration of a transfer job. The storage subsystem is extensible and an implementation could be created that is backed by some type of distributed mesh. Other than storage to enable user-initiated data transfer, the platform does not track identity, perform data syncing, or replicate data to edge devices; those tasks are outside what exists in the platform today.