dat-ecosystem-archive / projects

Dat Project Projects [ DEPRECATED - More info on active projects and modules at https://dat-ecosystem.org/ ]
http://dat-data.com
13 stars 3 forks source link

[product] Dat Cloud (Public Peer Service) #20

Closed joehand closed 7 years ago

joehand commented 7 years ago

The Dat "cloud" runs persistent and highly available dat instances as a service to users (public peers as a service). A cloud will bridge the gap between traditional p2p services, where all data is controlled by users and stored in local instances, and centralized services, where all data is controlled by the service and stored in centralized locations. The Dat cloud will enable user controlled data while also ensuring general accessibility.

This project piggy-backs and overlaps with the Dat server but has different users (cloud targets end users vs dat server targets institutions) and management (cloud is centrally managed by us, server is user-managed).

Who are the users?

A Dat user who wants to publish files or data in a long-term and persistent way, similar to how other common cloud services work now. Any Dat user can sign up to use the Dat cloud. In discussion, we talked about starting with unlimited free storage for academic users (.edu email address) and limited free storage for the general public.

Additionally, any apps that want user controlled data with persistent data accessibility could use the cloud services through user authentication.

Scientists

Scientists can use the cloud to share, backup, or publish research data and code. Initially this will be a manual publishing process but eventually we will automatically backup a Dat registered to dat.land fitting certain criteria (e.g. has a published DOI).

General Public

To encourage usage of the Dat ecosystem, it would be nice if people could create a share a dat without having to understand how to persist it when they go offline. This could enable cool userland things like dat websites and custom applications (see below).

App Developers

Eventually, application developers may also use the Dat cloud. In this situation, an app user would authenticate to the cloud via the app. Then the app could store the user's data on the user's cloud account.

How would the user interact with it?

The initial version will be a website where users can add Dat links they want to backup to their cloud.

Why is the project important to the ecosystem?

Automatic deployment of Dats is essential for the ecosystem. Without high availability of data, it will be difficult to encourage adoption and achieve persistent publishing of scientific data. Creating an easy-to-use cloud will encourage users to try out the Dat ecosystem without requiring them to setup their own servers to persist their Dats.

Scientific Data Publishing & Archiving

Scientists should be able to share & publish data without worrying about if it's being served online via their p2p service. The cloud will give them an easy way to publish and minimal hurdles to overcome putting data online so other's can access.

Additionally, we can use the cloud to automatically backup science data (that meets certain criteria) to the internet archive.

Encourage easy adoption for general users

p2p technology is hard to understand for many users. By creating a cloud we can leverage the strengths of a p2p system without giving up the benefits of a centralized cloud system. Recently, as cloud-based centralized services have taken off users understand the benefit of them. We can easily provide these benefits (high uptime, availability, ease of publishing/sharing) without locking the user into a specific service.

Build Apps with User-Controlled Data

A Dat cloud would allow developers to easily create applications where the user controls their data (it lives on the user's cloud) but the app can use the data assuming high availability. This enables social-media like applications without having to assume users will be online to share their data.

I've been thinking about this as different types of data silos. Right now we have "service silos", where data is silo'ed by each service provider (i.e. Twitter data in the Twitter silo, Gmail data in Google silo, etc.). A Dat cloud would encourage "user silos", where data is silo'ed by user ownership (i.e. Joe's Dat cloud would contain a dat with my scientific data, a dat with my blog posts, and a dat with my Twitter data). The applications get access to Dat(s) within my user silo, which they can read/write.

What defines the minimum requirements to sufficiently release (version 1)

joehand commented 7 years ago

TODO

Mathais and I talked yesterday about the few things we need to work on right away for this

Critical

Non-Urgent Optimizations

joehand commented 7 years ago

Closing this in favor of the Dat Server (#8). Migrated info there.