joehand commented 7 years ago

The Dat "cloud" runs persistent and highly available dat instances as a service to users (public peers as a service). A cloud will bridge the gap between traditional p2p services, where all data is controlled by users and stored in local instances, and centralized services, where all data is controlled by the service and stored in centralized locations. The Dat cloud will enable user controlled data while also ensuring general accessibility.

This project piggy-backs and overlaps with the Dat server but has different users (cloud targets end users vs dat server targets institutions) and management (cloud is centrally managed by us, server is user-managed).

Who are the users?

A Dat user who wants to publish files or data in a long-term and persistent way, similar to how other common cloud services work now. Any Dat user can sign up to use the Dat cloud. In discussion, we talked about starting with unlimited free storage for academic users (.edu email address) and limited free storage for the general public.

Additionally, any apps that want user controlled data with persistent data accessibility could use the cloud services through user authentication.

Scientists

Scientists can use the cloud to share, backup, or publish research data and code. Initially this will be a manual publishing process but eventually we will automatically backup a Dat registered to dat.land fitting certain criteria (e.g. has a published DOI).

General Public

To encourage usage of the Dat ecosystem, it would be nice if people could create a share a dat without having to understand how to persist it when they go offline. This could enable cool userland things like dat websites and custom applications (see below).

App Developers

Eventually, application developers may also use the Dat cloud. In this situation, an app user would authenticate to the cloud via the app. Then the app could store the user's data on the user's cloud account.

How would the user interact with it?

The initial version will be a website where users can add Dat links they want to backup to their cloud.

Why is the project important to the ecosystem?

Automatic deployment of Dats is essential for the ecosystem. Without high availability of data, it will be difficult to encourage adoption and achieve persistent publishing of scientific data. Creating an easy-to-use cloud will encourage users to try out the Dat ecosystem without requiring them to setup their own servers to persist their Dats.

Scientific Data Publishing & Archiving

Scientists should be able to share & publish data without worrying about if it's being served online via their p2p service. The cloud will give them an easy way to publish and minimal hurdles to overcome putting data online so other's can access.

Additionally, we can use the cloud to automatically backup science data (that meets certain criteria) to the internet archive.

Encourage easy adoption for general users

p2p technology is hard to understand for many users. By creating a cloud we can leverage the strengths of a p2p system without giving up the benefits of a centralized cloud system. Recently, as cloud-based centralized services have taken off users understand the benefit of them. We can easily provide these benefits (high uptime, availability, ease of publishing/sharing) without locking the user into a specific service.

Build Apps with User-Controlled Data

A Dat cloud would allow developers to easily create applications where the user controls their data (it lives on the user's cloud) but the app can use the data assuming high availability. This enables social-media like applications without having to assume users will be online to share their data.

I've been thinking about this as different types of data silos. Right now we have "service silos", where data is silo'ed by each service provider (i.e. Twitter data in the Twitter silo, Gmail data in Google silo, etc.). A Dat cloud would encourage "user silos", where data is silo'ed by user ownership (i.e. Joe's Dat cloud would contain a dat with my scientific data, a dat with my blog posts, and a dat with my Twitter data). The applications get access to Dat(s) within my user silo, which they can read/write.

What defines the minimum requirements to sufficiently release (version 1)

User can sign up (this could piggy-back off existing dat.land user accounts) and receive unlimited (scientists) or limited free storage.
User can visit website, sign in, and input dat link to push to cloud.
User can view progress of Dat backup to cloud.
User can see that Dat is persisted and available as a public peer.
(?? Dat will automatically update if the user is sharing a live copy)
User can go offline with their local dat and other users will still be able to access their data via the cloud.
User can view Dats they have published and remove/stop sharing them.
What are some stretch goals or interesting features for further releases (version 2, 3)
Option to serve data over http endpoint
CLI backup tools
External authentication (allow apps, beaker, etc.) to authenticate to cloud and backup app-specific data.
Automatically backup Dats published to registry for certain users.
Statistics on downloads, active users, etc.

joehand commented 7 years ago

TODO

Mathais and I talked yesterday about the few things we need to work on right away for this

Critical

Secure peer-network - Peer network should be secure in some way so only specific people can connect to a given network.
Optimize dat-archiver - Storing Dats as files will not scale to a large number of Dats. We should store Dats as the two hypercore feeds instead (e.g. SLEEP formats). Using hypercore-archiver to do this.
Dat-pull module - Pull a given dat to the server and show completion information.

Non-Urgent Optimizations

Update discovery-swarm to cut down on dht announced if sharing many dat swarms.

joehand commented 7 years ago

Closing this in favor of the Dat Server (#8). Migrated info there.

dat-ecosystem-archive / projects

[product] Dat Cloud (Public Peer Service) #20

Who are the users?

Scientists

General Public

App Developers

How would the user interact with it?

Why is the project important to the ecosystem?

Scientific Data Publishing & Archiving

Encourage easy adoption for general users

Build Apps with User-Controlled Data

What defines the minimum requirements to sufficiently release (version 1)

What are some stretch goals or interesting features for further releases (version 2, 3)

TODO

Critical

Non-Urgent Optimizations