Integrate Autonomys DSN

jfrank-summit commented 1 month ago

Once object mapping and retrieval are implemented in https://github.com/autonomys/subspace we need to integrate the functionality into Auto Drive. See DSN Data Retrieval.

[x] Track status of upload (submitted, in block, cached, archived)
[x] Store DSN location (piece index, offset) of objects in db
[ ] On cache miss retrieve object from DSN

Reconstructing files from IPLD nodes, along with the querying of these nodes, will happen on the Auto Drive server. It can be modularized into its own gateway, the Auto Files Gateway, (that still sits on the same server) if desired by the developer.

Listening to blockchain emitted events and storing the IPLD node to DSN piece mapping will be known as the Object Indexer. It will sit on its own server. For now, its authentication for requests will consist of whitelisting Auto Driver instances/servers. In the future, its authentication may rely on onchain smart contract data (detailing subscription plans of users) or validating blockchain based pay-per-use transactions. The object indexer will be separated from the Auto Drive server, as it is very likely developers who use the SDK will want to use the object indexer, as they will not want to keep track of this material themselves.

clostao commented 1 month ago

Currently, the cache is at DB level and it does persist over time. Should we use a cache service (e.g Redis) instead and remove the data from the DB level, right?

its-colby commented 1 month ago

I'm not sure I understand this recent comment. The backend server should be a database itself. The database acts as a cache to the blockchain. This doesn't mean that the backend literally needs to be a commonly used caching service, like Redis. @clostao Nonetheless, the database on the backend should have a hot and cold component, with the hot component being in RAM and being ready to serve instantly. In other words, in our current architecture.....blockchain has 100% of data.....server's cold storage has 20% of the data....server's hot storage has 5% of the 20%.....client's machine has 0.25% in html local storage. at any point, if one layer doesn't have the data, they request it from the next layer up.

Does this clarify things?

Eventually, we will add multiple servers as well. That layer is where we would horizontally scale.

autonomys / auto-drive

Integrate Autonomys DSN #29