application-research / delta-dm

Delta Large Dataset Manager
7 stars 3 forks source link

Δ Dataset Manager (DDM)

What is this?

A tool to manage deal replication tracking for onboarding datasets to the Filecoin network via import storage deals. This provides a solution to quickly make deals for massive amounts of data, where the transfer is better handled out-of-band.

Core Concepts

Dataset

The top-level logical grouping of data in DDM is the dataset. Datasets are identified by a name (aka "slug"), along with a replication quota, deal length, and a wallet to make the deals from. Datasets are added independently from the content making them up.

Content

Once a dataset has been created, content may be added to it. A content represents a .CAR file - archive of data that will be shipped to the SP and loaded into their operation. Content is identified by its PieceCID (CommP), has two sizes (raw file size, Padded Piece Size), and also contains a CID of the actual data (Payload CID).

Providers

DDM tracks deals to Storage Providers in the network. Add a list of storage providers to DDM before making deals to begin tracking them.

Replication Profiles

A Replication Profile is what ties a Dataset together with a Provider. It defines the parameters for any deals made to that provider for that dataset. Currently, it allows specifying whether to keep an unsealed copy and whether to announce to the IPNI indexer. This allows for flexibility in how deals are made to different providers, such as defining a single SP to host the unsealed copies for retrieval while the others maintain a cold copy for backup.

Replication

Once a Dataset, Content, Providers, and a Replication Strategy have been specified, DDM can make replications for the content to the providers. A Replication is a single deal made to a single provider for a single piece of content. Replications are tracked by DDM, and can be queried for status and deal information.

Instructions

Usage

DDM runs as a daemon, which is a webserver. Start it up with the daemon command.

./delta-dm daemon

By default, delta-dm daemon runs on port 1415. It can be changed with the --port flag or DELTA_DM_PORT environment variable.

Once running, you can interact with DDM through the API, CLI, or via the Delta Web frontend

API

See api docs in /docs/api.md.

Command-Line Interface

See cli docs in /docs/cmd.md.

Provider Self-service

See docs in /docs/self-service.md.

Importing CIDs from Singularity

See docs in /docs/singularity-import.md.

Developer Tips

By default, DDM will run using a SQLite database. This is fine for development, but for production use, it is recommended to use a Postgres database. To test this, you can run a Postgres instance in Docker and connect to it with DDM.

docker run --name ddm-postgres -p 5432:5432 -e POSTGRES_PASSWORD=password -d postgres:14.7
psql postgres://postgres:password@localhost:5432/ # to connect to the database

Update the env file (or --db flag) to connect to the dev postgres database.

DB_DSN="postgres://postgres:password@localhost:5432/"