Frando commented 5 years ago

abstract-dat

A common interface for hypercore based data-structures

There's a lot of hypercores based data structures. When working on higher-level tools, oftenly it does not really matter whether you're dealing with a hyperdrive, a hypercore, a hypertrie, a hyperdb, a multifeed instance or even a kappa-core or a cabal.

Most of these need, in their constructor:

A random-access-storage instance
A key, or a key pair, when wanting to sync an existing archive
An opts object

They also all have a sufficiently similar public API that ususally, apart from the structure-specific methods, include:

key: The public key (Buffer)
discoveryKey: The discovery key (Buffer)
ready (cb): Async constructor
replicate (opts): Create a hypercore-protocol stream for replication (or use one passed as opts.stream)

Structures that are composed of more than one hypercore oftenly also expose a feeds(cb) method that invokes cb with a list of hypercores. There should maybe be a second argument to the callback function that contains type-sepcific metadata for the feeds (e.g. content vs metadata feeds in hypercore).

Some of the more recent data structures can accept a hypercore factory/constructor, either as argument or option. If that is passed, a storage instance is not needed anymore.

There's also a lot of common options, mostly derived from hypercore: sparse, live, valueEncoding

If we turn this abstract-dat interface into a standard (maybe like in the random-access-storage ecosytem), higher level tools can easily work with different data structures. Examples for higher level tools are libraries/managers of multiple dats, debug tools like dat-doctor, and hopefully soon something like a dat-sdk.

Additionally, higher-level tools like cabal could easily also adhere to such an interface, and thus be managed with the same tools as hyperdrives etc.

It's very little that's not already common. One thing is the question of hypercore factory vs. storage instance for structures composed out of hypercores. I'd propose to stay with the storage instance as default, but always also support a hypercore opt that has a hypercore factory (but then the storage arg would be null?). This is pretty much the only difference in signature that I could find (multifeed has a hypercore constructor as first arg, while all others have a storage instance (or path).

I'm not completely sure what the best process for such a standardization is, it would likely involve two parts:

settle on a common interface: Would need maybe a little more research, and then a DEP with the documentation, I guess.
settle on naming: I quite like abstract-dat as a label for hypercore based data structure but please give other suggestions if you have some
adopt it across the ecosystem: Might need new major versions for some tools

Anyway, I'm creating this issue first to gather some feedback before writing up a DEP :-)

bnewbold commented 5 years ago

Thanks for writing this up! In particular it's helpful to have the reference/comparison of existing API signatures.

I should prefix my feedback by admitting i'm not familiar with Javascript development, and have not written code using the APIs you are describing, so I may be misunderstanding your motivation.

My general feedback is that the ecosystem around hypercore and dat already has a lot of moving parts, protocols, and interfaces. I think it's a struggle for newcomers to understand what all the pieces are, how they interact, and what the development/stability status of them all are. Instead of adding another interface or abstraction layer, I think it would be better for "ecosystem design" to have fewer components and interfaces but have them interact more clearly. It feels to me like the hypercore abstraction is one of the most stable and consistent, and already makes it possible some of the higher-level tools you mention, like synchronization, archiving, pinning, etc of hypercores regardless of the content or higher-level structures. Is there a way we could improve that existing abstraction to support any additional needs, instead of adding another interface on top of it? Or, would it be sufficient to have conventions and documented best practices (a DEP?) instead of an explicit, named new interface?

As specific feedback, I think abstract-dat is a confusing name. I think there is already a lot of "is dat a command, protocol, framework, project, community, API, brand, etc" conflation. Maybe hypercore-service or hypercore-structure? Or "Norms and Conventions for Services Built On hypercore Feeds"?

NCSBOhF, just kidding.

RangerMauve commented 5 years ago

I think another thing we should have is a feed event for when a new feed gets identified dynamically.

RangerMauve commented 5 years ago

An update on this abstract-dat stuff.

With the new daemon that's being built, we'll be basing Hyperdrive and the such off of CoreStore which will be use to manage a list of hypercores for a given data structure, as well as manage storage / replication for them.

This accomplishes much of the stuff abstract-dat would be trying to do, but it's a little more minimal and cleaner IMO. Basically, the lower level of the stack will be based on corestores, and data structures like multifeed will be given corestores in the constructor.

Check out the corestore docs in the dat SDK dream API

Frando commented 5 years ago

I agree fully to this. Having corestores as the only dep for higher-level data structures is the way to go for now.

dat-ecosystem-archive / DEPs

Discussion: Let's settle on a common abstract-dat interface #60

abstract-dat