datonic / datadex

📦 Serverless and local-first Open Data Platform
http://datadex.datonic.io
MIT License
219 stars 14 forks source link

Decentralized Data Lake Ideas #27

Open davidgasquez opened 11 months ago

davidgasquez commented 11 months ago

Random thoughts around decentralized and permissionless data lakes.

Also from https://github.com/davidgasquez/datadex/issues/22#issuecomment-1558988508.

Reading "The Database I Wish I Had" and thinking about something like that for OLAP workloads. Feels like OLAP use cases might be the "killer database" for IPFS/Hypercore/Dat. For analysis, you want data to be inmutable, don't care that much about latency, and have to store large amount of data.

davidgasquez commented 8 months ago

Chatted with some folks working on Subsquid. They're doing interesting things on the decentralized data lake area.

This is more or less what I understood about how the Subsquid Archive works.

  1. Right now, data is indexed by Subsquid itself (running substrate-ingest). In the future, anyone will be able to publish their arbitrary datasets.
  2. Indexed data is packaged and into height partitioned Parquet files and sent to an orchestrator/router that distributes these across nodes in the subsquid network. This orchestrator takes into account dataset durability, response times, geolocation distribution, ...
  3. Users send (and pay) queries to the Subsquid network (via a gateway or contract?), and the gateway will select the nodes to run these queries. Nodes will run the query (DuckDB on the nodes), and send back the results.

Subsquid Labs maintains public Archive endpoints and offers batch access via the Squid SDK free of charge.

Questions

davidgasquez commented 6 months ago

Adding a small note that Dagster is already relying on "hashes" to check when runs are needed! A step closer to fully content addresses workflows.

davidgasquez commented 5 months ago

You can ATTACH to a remote DuckDB database! There might be a world where a bunch of people publish their small/medium databases and people just attach to them.