Support for message abstraction layer on top of hypercore

aschrijver commented 7 years ago

I just discussed with @mafintosh my recent proposal to add a message abstraction on top of hypercore as a supported part of the Dat ecosystem, effectively turning it into a full message-based application platform.

You can read the proposal in this particular comment on the vision/future thread: https://github.com/datproject/dat/issues/824#issuecomment-315601713

After some back and forth @mafintosh agreed on the benefits of providing a tiny messaging library, and define some standard message formats.

Hereby the details of that freenode chat:

[11:24] <barnie> mafintosh: are you online?
[11:34] <@mafintosh> barnie: hi!
[11:34] <barnie> hi! I will create a gh issue on using dat + hypercore in a message-based design along the lines of this: https://github.com/datproject/dat/issues/824#issuecomment-315601713
[11:35] <barnie> i am very curious to get feedback on the possibilities
[11:40] <@mafintosh> barnie: hypercore is already message based
[11:40] <barnie> are you refering to hypercore-protocol?
[11:41] <@mafintosh> ya
[11:41] <barnie> i consider that a wire format
[11:41] <barnie> i am talking about a messaging layer on top
[11:41] <@mafintosh> and and hypercore itself is basically persistent pubsub
[11:41] <barnie> on in which you have application and domain concepts
[11:43] <barnie> i am thinking of Event collaboration (between peers) and Event sourcing, CQRS, DDD (in rich client apps)
[11:44] <barnie> thus ideally suited for a social network that can grow organically in feature set
[11:44] <@mafintosh> barnie: whats stopping you from doing that now?
[11:45] <@mafintosh> its perfect for event sourcing and friends atm
[11:45] <@mafintosh> and the random access aspect makes it v powerful for real time apps
[11:45] <barnie> ya i noticed that. its the reason I chose dat above ssb
[11:46] <barnie> but what makes me unhappy is that my message-based route would fork from your own direction
[11:46] <barnie> while you could easily incorporate it with a slight repositioning of technology
[11:46] <barnie> as i described
[11:49] <@mafintosh> I'm pretty happy with the stack as it is. You can write custom messages over the protocol stream if you prefer. I do that in a couple of applications
[11:50] <@mafintosh> If you can explain in a bit fewer words what lacking for your application I might understand it better
[11:50] <@mafintosh> The thread is too long for me to dig in.
[11:51] <barnie> Thats why I pointed you to that single comment. Its not that long. The benefits described should be of interest to you
[11:52] <barnie> i also created an executive summary: https://github.com/datproject/dat/issues/824#issuecomment-316083350
[11:55] <@mafintosh> barnie: i dont see the benefit of your approach compared to random access logs.
[11:55] <@mafintosh> you can easily write a secure messaging system of top of logs
[11:56] <barnie> wouldn't having a dat defined messaging layer make it much easier to develop decentralized apps?
[11:56] <barnie> now I'll redo the work you've already done
[11:57] <yoshuawuyts> barnie: different apps need different schemas; hyperlog just provides the primitives for you to build upon
[11:57] <@mafintosh> hypercore
[11:57] <yoshuawuyts> *hypercore
[11:57] <barnie> yes, the - too - primitives
[11:58] <barnie> we'll create forks
[11:58] <barnie> not of hypercore
[11:58] <barnie> but things on top
[11:59] <@mafintosh> barnie: hypercore is just a distributed message log
[11:59] <@mafintosh> with guaranteed ordering
[11:59] <barnie> i know. i'm not talking only of hypercore, though, but dat ecosystem as a whole
[12:00] <barnie> hypercore is fine as it is right now in the whole event-based discussion
[12:00] <@mafintosh> then i'm not completely sure what you are missing :)
[12:00] <@mafintosh> dat is a tiny layer on top of hypercore
[12:00] <barnie> when you go from hypercore to hyperdrive there is a missed opportunity
[12:01] <@mafintosh> All replication is still hypercore based
[12:01] <barnie> you go from raw streams directly to file chunks (if i am correct)
[12:01] <barnie> while you could have a layer in between having file-chunk messages
[12:02] <barnie> or any file-related data type you define
[12:02] <@mafintosh> yea that sounds cool
[12:02] <@mafintosh> you can just write that module :)
[12:03] <barnie> yes, maybe I will, but I am yet an outsider coming from Java background, and not the person to get it there speedily. And in the meantime you may evolve to make it more difficult to use all modules
[12:04] <barnie> how much would it cost you to add this in the core design?
[12:04] <barnie> not much overhead I presume
[12:05] <@mafintosh> barnie: i'm still a bit unsure what you need. the core abstraction *is* hypercore :)
[12:06] <@mafintosh> and thats not gonna change
[12:06] <@mafintosh> and i think your application sounds like a good fit
[12:06] <barnie> i say that's the wire protocol abstraction, you have no data communiction protocol abstraction (if that's the right word)
[12:07] <barnie> that is already good to hear, thx!!
[12:07] <@mafintosh> barnie: hypercore is the data one
[12:07] <barnie> it's the raw stream one, AFAICS
[12:07] <barnie> and some protocol messages
[12:08] <@mafintosh> you can send custom messages over the hypercore-protocol stream
[12:08] <barnie> do you advice me to extend the protocol-buffer schema's with my message types?
[12:08] <@mafintosh> i'd advise you to model it on top of hypercore directly
[12:08] <barnie> ya, that's the do-it-myself-and-you-do-it-differently abstraction on top
[12:09] <@mafintosh> then you wont have to worry about integrity/auth
[12:09] <barnie> the forks being created as you grow
[12:09] <@mafintosh> just think of hypercore as your pubsub layer
[12:10] <barnie> i agree fully with that, just think dat needs an additional message layer
[12:10] <barnie> it only defines formats, not message types
[12:10] <barnie> thats for the application designers to do
[12:10] <barnie> like a spec
[12:10] <@mafintosh> Yea there is room for that on top of hypercore
[12:11] <@mafintosh> I agree
[12:11] <barnie> then in hyperdrive you just have thin File + Chunk msg wrapper (couple of bytes overhead)
[12:11] <barnie> and it'll have become an application of the message bus
[12:12] <@mafintosh> I wouldnt at that point call it a message bus
[12:12] <barnie> maybe not the right word, agree
[12:12] <@mafintosh> Cause we do lot of random access on the messages
[12:13] <@mafintosh> Log is basically the core of it all
[12:13] <barnie> ya, but with random access you mean once they have been stored in the log / feed
[12:13] <barnie> i am talking on the state where they still roam the network
[12:14] <barnie> the random accessing is fine, if I don't want that, because i have event sourcing, then thats application-specific, no problem.
[12:16] <@mafintosh> ya network roaming is important
[12:16] <@mafintosh> Thats why hypercore is a minimal dep
[12:16] <@mafintosh> Cause its the network/state related one
[12:17] <barnie> agree. should stay that way
[12:17] <barnie> WDYT about the rest?
[12:17] <@mafintosh> So a server that can replicate a log you deploy today will forever work with super applications we build
[12:17] <barnie> yes
[12:18] <@mafintosh> barnie: i think there is merit for a common set of messages of top above for sure
[12:18] <@mafintosh> and someone should experiment with building that
[12:18] <@mafintosh> as a tiny module
[12:18] <barnie> the more i hear you talking the more i think you need some kind of small message abstraction on which other devs can build
[12:18] <barnie> exactly
[12:19] <@mafintosh> barnie: you might be able to do it simply as a set of protocol-buffer messages
[12:19] <@mafintosh> that is then appended to a hypercore
[12:20] <barnie> yes, but that binds you to a specific tech (protbuf)
[12:20] <@mafintosh> any schema would do

Any additional feedback on this topic is most welcome!

aschrijver commented 7 years ago

@mafintosh some first input..

I presented vert.x before as a good case study, and I will do so again for message-based system design, and blurt out some thoughts along with it.

Vert.x is message-based and has the concepts of a distributed event bus and event bus bridges.

Officially supported toolkit modules are a TCP Eventbus Bridge, a Camel Bridge, a SockJS Proxy Service, and various community-offered bridge implementations (another SockJS impl, ZeroMQ, AMQP - Kafka bridge, Saltstack bridge, server-sent events bridge)

They've kept their message design as simple as possible.

The event bus supports 2 communication modes:

publish / subscribe
send / reply

On the wire protocol level everything is a Frame:

<Length: uInt32><{
   type: String,
   address: String,
   (replyAddress: String)?,
   headers: JsonObject,
   body: JsonObject
}: JsonObject>

Only the following 4 frame types exist:

sendto send a message to an address
publish to publish a message to an address
register to subscribe to the messages sent or published to an address
unregister to unsubscribe to the messages sent or published to an address

The headers can be any JSON e.g.:

from a simple map of metadata key/value pairs
to a JSON-Schema or even JSON-LD document semantically describing the body payload, or what have you..

The body payload can be any data type, to be determined from header metadata, need not be JSON. Could be Buffer, hex-encoded image, etc.

Some thoughts:

The address and replyAddress could be dat://-urls (useful for handshakes)
May want to include a signature field
This could be easily implemented in hypercore-protocol .proto definition
Maybe what's currently there should become Frame types
May be wise to standardize, or advice on some preferred header formats

Finally, if Dat message format is only in the slightest way compatible to what Vert.x now has, you easily write an event bus bridge impl. for it, and Dat will now have added a whole ecosystem and dev community to their base!

joehand commented 7 years ago

This sounds great. I'd agree with mafintosh that it'd be nice to see this implemented on top of hypercore.

I'm closing this as it is not related directly to the Dat CLI. Feel free to open an issue in our discussions repo for further discussion on this: https://github.com/datproject/discussions

aschrijver commented 7 years ago

Thanks. I put this intentionally in this project, because it has the highest visibility (watchers) and thus chance to get good feedback.

(in the discussions project last issue is 1 year old, 33 watchers vs. 300 watchers)

dat-ecosystem / dat

Support for message abstraction layer on top of hypercore #826