dat-ecosystem-archive / projects

Dat Project Projects [ DEPRECATED - More info on active projects and modules at https://dat-ecosystem.org/ ]
http://dat-data.com
13 stars 3 forks source link

[product] Dat Cloud & Dat Server (self-hosted cloud) #8

Open okdistribute opened 8 years ago

okdistribute commented 8 years ago

Folks who want to publish dats in a more long-term way, rather than for short-term file sharing who aren't familiar with the commandline/dev ops/system administration.

How would the user interact with it?

The user would be able to easily deploy an automatic backup server to AWS, heroku, or some other

Why is the project important to the ecosystem?

The current cli tool is easy to use by developers and sysadmins to deploy and automatically backup dats, but we don't really have a good deployable solution for managing dats through a UI or automatically.

What defines the minimum requirements to sufficiently release (version 1)

Deploy a dat server with one click on amazon/heroku/digital ocean. Maybe a docker image? Have a GUI to copy-paste dat links and have them be replicated and served. Authentication? See a list of dat links. See how much space is being used by the server. See how many peers are connected to each dat.

What are some stretch goals or interesting features for further releases (version 2, 3)

Automatic backup by integrating with CLI/Desktop application. See more statistics about upload/downloads across all dats, uptime, peers. More advanced configuration. HTTP REST api

Dat Cloud

The Dat "cloud" runs persistent and highly available dat instances as a service to users (public peers as a service). A cloud will bridge the gap between traditional p2p services, where all data is controlled by users and stored in local instances, and centralized services, where all data is controlled by the service and stored in centralized locations. The Dat cloud will enable user controlled data while also ensuring general accessibility.

This project piggy-backs and overlaps with the Dat server but has different users (cloud targets end users vs dat server targets institutions) and management (cloud is centrally managed by us, server is user-managed).

Who are the users?

A Dat user who wants to publish files or data in a long-term and persistent way, similar to how other common cloud services work now. Any Dat user can sign up to use the Dat cloud. In discussion, we talked about starting with unlimited free storage for academic users (.edu email address) and limited free storage for the general public.

Additionally, any apps that want user controlled data with persistent data accessibility could use the cloud services through user authentication.

Scientists

Scientists can use the cloud to share, backup, or publish research data and code. Initially this will be a manual publishing process but eventually we will automatically backup a Dat registered to dat.land fitting certain criteria (e.g. has a published DOI).

General Public

To encourage usage of the Dat ecosystem, it would be nice if people could create a share a dat without having to understand how to persist it when they go offline. This could enable cool userland things like dat websites and custom applications (see below).

App Developers

Eventually, application developers may also use the Dat cloud. In this situation, an app user would authenticate to the cloud via the app. Then the app could store the user's data on the user's cloud account.

How would the user interact with it?

The initial version will be a website where users can add Dat links they want to backup to their cloud.

Why is the project important to the ecosystem?

Automatic deployment of Dats is essential for the ecosystem. Without high availability of data, it will be difficult to encourage adoption and achieve persistent publishing of scientific data. Creating an easy-to-use cloud will encourage users to try out the Dat ecosystem without requiring them to setup their own servers to persist their Dats.

Scientific Data Publishing & Archiving

Scientists should be able to share & publish data without worrying about if it's being served online via their p2p service. The cloud will give them an easy way to publish and minimal hurdles to overcome putting data online so other's can access.

Additionally, we can use the cloud to automatically backup science data (that meets certain criteria) to the internet archive.

Encourage easy adoption for general users

p2p technology is hard to understand for many users. By creating a cloud we can leverage the strengths of a p2p system without giving up the benefits of a centralized cloud system. Recently, as cloud-based centralized services have taken off users understand the benefit of them. We can easily provide these benefits (high uptime, availability, ease of publishing/sharing) without locking the user into a specific service.

Build Apps with User-Controlled Data

A Dat cloud would allow developers to easily create applications where the user controls their data (it lives on the user's cloud) but the app can use the data assuming high availability. This enables social-media like applications without having to assume users will be online to share their data.

I've been thinking about this as different types of data silos. Right now we have "service silos", where data is silo'ed by each service provider (i.e. Twitter data in the Twitter silo, Gmail data in Google silo, etc.). A Dat cloud would encourage "user silos", where data is silo'ed by user ownership (i.e. Joe's Dat cloud would contain a dat with my scientific data, a dat with my blog posts, and a dat with my Twitter data). The applications get access to Dat(s) within my user silo, which they can read/write.

What defines the minimum requirements to sufficiently release (version 1)

max-mapper commented 7 years ago

Me and @joehand were thinking of making a 'dat push' workflow that used a simple CLI only dat server. We're planning on using https://github.com/mafintosh/peer-network and calling it https://github.com/maxogden/dat-archiver, the idea is you can do dat push <remote> to upload your data to a specific remote address.

The goal is to have something super simple that scientists can run on linux machines in their labs that they can push and pull from. I think this will be a good first step towards the full GUI version described above in this issue.

joehand commented 7 years ago

TODO

Mathais and I talked yesterday about the few things we need to work on right away for this

Critical

Non-Urgent Optimizations

joehand commented 7 years ago

Hypercore-archiver is going to provide high performance storage for this project.

We've started to also add HTTP APIs to hypercore-archiver so we can use HTTP instead of peer-network.