datdotorg / datdot-research

datdot research
https://playproject.io/datdot-research
MIT License
6 stars 0 forks source link

DDEP-0001:hypercore-header-for-manifestfeed #17

Open serapath opened 4 years ago

serapath commented 4 years ago

@todo


motivation There are many ways why feeds need to be linked parent to dependant to dependencies, dependencies to dependant, domain to content, feed to author, related feeds amongst each other and I think it would be bad to have everyone (app/protocol/datastructure) make those things up instead of following a general standard

ongoing discussions

  1. (see comments below)
  2. dat comm-comm discussion

messy incomplete list of involve community/ecosystem in no particular order (feel free to add/correct the list below by mentioning that in a comment)

  1. [ ] https://twitter.com/andrewosh
  2. [ ] https://twitter.com/mafintosh
  3. [ ] https://twitter.com/carax
  4. [ ] https://twitter.com/liminalpunk
  5. [ ] https://twitter.com/pvh
  6. [ ] https://twitter.com/substack
  7. [ ] https://twitter.com/hackergrrl
  8. [ ] https://twitter.com/elmasse && https://twitter.com/tinchoz49
  9. [ ] https://twitter.com/dan_mi_sun
  10. [ ] https://twitter.com/pfrazee
  11. [ ] https://twitter.com/heapwolf
  12. [ ] https://datpy.decentral1.se/
  13. [ ] https://datrs.yoshuawuyts.com/
  14. [ ] https://twitter.com/zootella
serapath commented 4 years ago

first draft (work in progress)

context

hypercore creates feeds (a.k.a logs) many feeds are not published via hyperswarm using their discoveryKey, but instead are indirectly published via hyperswarm using the discoveryKey of a related feed

  1. currently every protocol/structure has a custom way to communicate it's related feeds
    • e.g. hyperdrive has metafeed with a first message to link contentfeed and further messages form a hypertrie to link mounts all signed by author
      • => only the author can spam and needs to be trusted to not do that and to protect against it i can just reject feed updates
    • e.g. cabal clients just send their feedkeys to new clients for collection via a MANIFEST extension message
      • => anyone can spam with new feeds and to protect against spam subjective whitelisting or blacklisting of feeds can be used
  2. It requires a user to use the right client when joining a swarmkey to request feeds
  3. If the user copy/pasted the "dat address" into a "dat cloud service" it would now be up to the cloud to request related

data structures on top of hypercore

data structure types https://github.com/datprotocol/DEPs/blob/master/proposals/0007-hypercore-header.md

Most dataStructureTypes are either using corestore or multifeed:

  1. multifeed: https://github.com/kappa-db/multifeed/blob/master/mux.js#L71

  2. corestore: https://github.com/andrewosh/corestore/blob/master/index.js

    • hyperdrive

1. hypercore

2. hypertrie

3. hyperdrive / corestore

4. corestore

5. *multifeed & multiplex / kappa-core

6. ...

...

problem

a generic seeding service receives a "dat address" and doesn't know which kind of data structure is behind that address and needs to know all related hypercores and how to retrieve them and a standard way of making them available to anyone who wants to use the data and data structure behind that "dat address"

requirements

a generic seeding service should not need any data structure specific code to know how to seed the data. This means the seeding service should not need to know all currently existing data structures build on top of hypercore nor need to be updated for every such data structure created in the future.

community suggestions

nettle:

this seeding service could also see what the first message they get from a peer is, and from there, figure out whether it's {multifeed,hyperdrive,etc} and do sync from there and just speak the major protocols, not as nice as a unified protocol though :)

  1. [connect to an address via hyperswarm]
  2. receive first [protocol] message from a peer to derive the protocol type
    • e.g. multifeed, hyperdrive, ...
  3. support the "major protocols"

# solution proposal (for a DEP about RELATED FEEDS)

after reading through `mux.js`, It made me feel that `"manifest"`
might as well be a `manifestfeed` and the `manifest handshake`
could share the `manifest feedkey` so any peer online
could share it and eventually related feeds while the author is offline.
  1. we might want to use DEP 0007 dataStructureType to identify if a given structure is one of a few supported types or a generic type
  2. we might want to make up a convention using DEP 0007 MyCustomHeader about what generic types are, where the main point would be to expect under AdditionalData an entry for manifest which points to a feedkey of the manifestfeed, which is probably a hypertrie to somehow store all the related feeds

or alternatively

  1. a special EXTENSION MESSAGE with the main feedkey
  2. receiving peers respond with their manifestkey
    • if there is no single author (e.g. multifeed), many manifestkeys exist
  3. the manifestfeed(s) contain a list of related feedkeys to the main feedkey
    • each entry specifies the swarmkey for that related feed and whether its relation is
      • a part (e.g. contenfeed)
      • or a link (e.g. mounts)
      • and maybe an origin to cite reason/proof why a feed is related
        • (e.g. for a contentfeedkey, that could be the hyperdrive metakey + chunk 0 of the real author)
martinheidegger commented 4 years ago

To my understanding the reason as to why the hyperdrive structure is how it is relates to performance and memory use. Particularly important is the number of round-trips necessary to until a data structure can be explored. If we have to read the manifest data before we can connect to the feeds it is one entire initial roundtrip added, which can easily mean 200ms added delay. Having the url be the identifier on how the feeds are handled might make it a better best-practice. Maybe we could have "patterns" of feed identification. i.e. the hypertrie pattern of hyperdrive 10 or the cabal pattern, which is part of the url specification.

serapath commented 4 years ago

@martinheidegger I answered here: https://github.com/datproject/comm-comm/issues/134#issuecomment-604806258

RangerMauve commented 4 years ago

I think it'd be cool if the related feeds mechanism was used just for cases when a peer doesn't know your data structure and wants to get related feeds for pinning and stuff, while keeping existing mechanisms that data structures use for performance.

Maybe related feeds is something you opt into loading after you get the initial data or through an extension that only certain peers will bother invoking?

serapath commented 4 years ago

@RangerMauve i hope this is fullfilled by the latest proposal update which i perceive to be: https://github.com/datproject/comm-comm/issues/134#issuecomment-604808606

serapath commented 4 years ago

additional considerations:

  1. list not only identifiers of related feeds, but also include the latest known tree hash signatures. (e.g. web packages might need this feature to link to snapshots)
  2. There are two general approaches without involving blockchains I can see:
    1. point to a feed from a certificate feed so you can update it (e.g. key rotation to point to new certified replacement feeds)
    2. point from a feed to a certificate feed e.g. in chunk0 so key compromise can't undo that.

On first sight they might both be able to solve the same problem, but they have some nuances especially if that general kind of pointers/links are used for different use cases.

  1. you might just want an option to update which feed you certify, but it doesn't proof a writekey for the certified subfeed in controlled. Multiple entities can both certify a given feed
  2. on the other hand, you could start any feed with a chunk0 that contains among other things information about:
    1. an authorized certificate feed
    2. together with the publickey of the current feed
    3. signed by the writerkey of the certificate feed

That way, nobody can copy such a message to any other feed to associate ownership over e.g. "child porn" with somebody, because the publickey wouldn't match


Whether people outsource maintanance of certificate feed to third parties or themselves, in both cases, a certificate feed could be HOT (especially if it is operated by some kind of trusted third party CA) so that feed having it's own parent certificate feed would be cool

furthermore, it would be cool to be able to specify a ring of certificate feeds (controlled by the same entity or different), where the majority of keys can vote out or vote in new keys to do key rotation to further improve security in case of lost or stolen private keys for such feeds.


All these issues are technically linking together feeds.

  1. related feeds (like e.g. web packages and dependencies or complicated data structures and protocols that require multiple feeds)
  2. and feed revocation mechanisms
  3. also transfer of feeds to new owners could be thinkable in this way

...it's just generally something that is much tougher to bolt onto things later on I believe.