Open msterle opened 4 years ago
Still doubt this is the best solution. Brain dump below:
Propagating pin requests won't help much if client isn't directly connected to pinning node that wants to pin data. Since each orbitdb database has a separate pubsub topic in which updates are shared, even if the pinning-node that is not directly connected to the client gets the pin request, it wont be able to pin any data since its not directly connected to the peer that has the data. So propagating the messages in this case will just be a waste of resources.
Propagating pin requests when needed. We can start using ipfs-pubsub-room, with it we can better detect when other peers come online, it also allows to send direct messages without sending to the entire room. With it we can detect when a peer comes online and always share the pin request with the new peer directly. Now this requires some reengineering on both 3box-js and 3box-pinning-node side, and potentially specing out exactly how this would function. I realize that this might be out of scope, but it seems like something closer to the correct answer.
yeah but this iteration assumes that everyone still connects our pinning node and that they are redundant to it, so data is always available there. But yeah it is definitely not a general solution. But other cases when this is not true i assume for some time clients will be able directly connect to the nodes that they want to pin data with
need to think through the second part, but yeah think we will want to change how we deal with pin messages overall
yeah but this iteration assumes that everyone still connects our pinning node and that they are redundant to it, so data is always available there.
I don't think that's true, consider the following:
yeah! thanks for the clear steps, its simple to relay orbit updates through nodes which the dbs are not open, but yeah would still not do initial handshake on new peer where existing heads are sent in 1on1 channel, dont have a follow up yet
would still fix very specific short term problem, since could assume that c connects to client eventually (current imp, although not a reliable assumption), but c at least doesn't miss pin db message on client open, since relayed before, then in brief time will get heads. But yeah not wholistic and can be weighed against longer term solutions vs time for quick fix.
Yeah, eventually they might connect. I guess the whole things relies on this anyhow.
closing, pursuing another solution
I am suggesting we reopen this ticket and apply it next.
After the discussion this week, it seems that we won't, in fact, be immediately pursuing ipfs nodes with unique peer ids, and in fact whether we want to do it at all is still an open question and a consensus unlikely to be reached soon given the current information.
For the below reasons, I suggest we prioritise this ticket over https://github.com/3box/3box/issues/943 and https://github.com/3box/3box/issues/944, which depended on this decision to reach production. Without a concrete plan on how to make ipfs HA and scalable, they would be of low value at this time, as a single ipfs node would not be deployed in production.
Reasons for pursuing this now:
The ipfs deployment question delaying the planned work above is difficult to resolve because both options rely heavily on assumptions, and I believe testing those assumptions is critical in guiding our decision-making, as well as helping us gain a deeper understanding of ipfs, libp2p and orbitdb. One assumption we are making is that it is relatively simple to replace ipfs/libp2p functionality with distributed equivalents. This ticket will allow us to test this assumption if we first clearly update the scope of work and acceptance criteria, as well as estimate it, and once the work is done, explore any unexpected issues that arose, and determine if more are likely (@zachferland @oed suggest we do the first part on Monday)
If we decide to proceed with a single ipfs id, this will have set up the first version of the pubsub forwarding layer, and would allow a relatively short path for the two tickets being deprioritised (split ipfs and pinning, with pinning workers)
If successful, this would allow us to show immediate value to our partners who want to run their own pinning nodes as part of the 3box network.
yes this sounds good, thats a good summary of the drivers here 👍
Can help create some well defined stories here, just want to sync on one last part first that i didnt have a chance to today. Basically whether to do iteration 1, or iteration 2 (+1) in one go, as referenced in notion. Can start here, since not well defined in notion yet:
One option for Iteration 1 is entirely at application level vs libp2p layer. It could relay messages as as new messages from our node. Alternatively to forwarding messages from other peers, not a new message, but same original payload and sender (this would be more native libp2p, and would allows peers to handle them a lot more intelligently).
But the second is not necessary yet (and current messages not done this way yet). It does effect the the seen message cache implementation though. If solely application level, yeah as you said in discord you could simplify even further by just adding bool to relayed message as a form of memory. With that we would just have to rely on assumption that we are the only node relaying pin db messages like this. Couldnt have other providers/nodes do it, otherwise same message would change many times over and be handled redundantly.
At more libp2p layer after, where we do forward messages instead, we would have to have a shared seen message cache (since cant change messages, just forwarding). libp2p pubsub does this already, but pindb messages are the only part of message space where this cache has to be shared.
After writing this out, seems like doing application layer is good for first iteration, since very simple and similar to current message handling, and it not adding anything that would need to be backwards compatibility later. Yes has some overhead, but close to same as current orbit messages, and known thing to improve after in iteration 1.5/2.
iter1: So relay messages like now (as new messages at app layer), but flag. Only handle/relay pin db messages not flagged. Assume/configure other pinning nodes to not relay pindb messages.
iter1.5: move to forwarding pin db message through libp2p, have seen message cache based on unique message ids.
iter2: forward all messages, including orbit messages through libp2p hooks
Overall, think either iter1, or straight to iteration iter1.5 is valid path for first iteration.
Thanks for condensing the above; I'd aim for the shorter amount of work for this. Because I still disagree with following the "supernode" path at the stage 3box is at (because it goes against recommended IPFS usage, which involves risk and unknowns that are being underestimated), I think we should aim for a tighter scope than 3-6 weeks of work.
I'm recommending we do the work primarily because we're blocked by this decision, so I'm hoping it will be useful in surfacing examples of the types of unexpected issues I'm going to encounter, and how those affect the amount of work involved. Do you think your 1 iteration version, or an even smaller chunk of work, would be useful for this kind of test, and do you know what kind of results you would find convincing one way or the other?
yup, i think with that framing iter1.5 is the most minimally "interesting" work to answer some of those questions, and also less than 3 weeks easily. iter1 is only days, and would not give any insight about thinking about actual pubsub layer.
Don't think anything here will invalidate directions, but think it will give idea of magnitude of work for each direction at this layer.
As said above, application level only would not lead to much insight, so suggesting second option (1.5) above. Where we hook in at libp2p pubsub layer to consume/relay messages. Will rough outline implementation here, could be some nuances still or hooked in at other levels maybe.
1) Extend and wrap https://github.com/ChainSafe/gossipsub-js. Primarily _processRpcMessage function https://github.com/ChainSafe/gossipsub-js/blob/master/src/index.js#L147.
class GossipSubWrap extends GossipSub {
// contructor to pass redis pubsub and redis seen cache, or other creator function
_processRpcMessage (msg) {
// probably parse util for msg first
if (msg.topicIDs.includes('3box-pinning')) {
const id = libp2p.utils.msgId(msg.from, msg.seqno)
if (this.pinningSeenCache.has(id)) {
// an internal node already consumed from external network, ignore and don't relay internal
return
}
this.pinningSeenCache.put(id)
// cache as seen and relay to all other nodes to relay to their clients
this.redis.publish('3box-pinning', msg)
}
// now emit to node, so orbit/3box can handle pindb req, relay to clients with gossibsub logic
super._processRpcMessage(msg)
}
_processInternalMessage(msg) {
// not sure shortest implementation here yet
// relay through logic in super._processRpcMessage(msg)
// just dont emit with this line https://github.com/ChainSafe/gossipsub-js/blob/master/src/index.js#L148
}
}
2) Use existing redis pubsub layer, subscribe to "3box-pinning" on start or other shared topic between all instances.
redis.subscribe('3box-pinning', GossipSubWrap._processInternalMessage)
3) pass configured gossipsubWrap to libp2p and ipfs instance on init
Notes: pinning message seen cache can also just be in redis, where values have a ttl of a few minutes seems like 5pt story, more if there are any unknowns still, or does not behave as expected, maybe 8 depending how much testing or what test case we use
@zachferland what benefit would there be in doing all this rather than adding a flag to the payload before forwarding it, such as seenBy3box: true
?
I'm assuming that we would also send out the pinning room messages received over redis externally over pubsub so that external nodes receive them, as I understood from earlier discussion?
* CLIENT * * COMM. NODE *
| ^
pubsub |
| pubsub
v |
* PINNING1 * --redis--> * PINNING2 *
This solution will end up sending out X-1 copies of each pubsub message to the swarm (where X is the number of pinning nodes we have up), where these copies will look different because they will have unique seqno
and from
entries (how ipfs pubsub tracks sender and message identity).
This also won't work with the pubsub workers built this sprint, because they use these properties of pubsub (sender and message identifier)
it can, but as said above doing that alone won't lend much insight to future work, and some pitfalls already known. But can be done as step to last comment to test some parts.
1) they wont be "forwarded" but new messages, so you will get some amplification of redundant messages across network, rather than letting pubsub handle same messages naturally
2) no on else can forward messages like this, same as 1, except now what was same message to start will be consumed as new messages.
3) doesnt give insight to relaying all messages in a similar way
oh see comment changed! reading now
ok think we are saying the same thing, yeah thats mainly why you would want to forward them like pubsub does (relay forward with same message id) and not create new messages
current orbit messages are creating new messages to relay, and this was before peers where directly connected. So this assumption was okay, but ideally they would be change to same improvement here.
dont know how context for pubsub workers this sprint
@zachferland The above sounds like it might work, so I've been trying it out. The first big hurdle I've encountered is that libp2p's pubsub process messages in a synchronous manner, whereas any shared structure like redis involves async in some way. This would at least involve rewriting more of gossipsub to be async.
https://github.com/ChainSafe/gossipsub-js/blob/1852ec82f1/src/index.js#L134
hmm right, see that, but looks like nothing looks for a return val from _processRpcMessage and would be find if we had promise in there that was not awaited to return, or i guess made that async?
Also maybe possible at other func layer like message validate
which is async function, called on every message and if not true, will drop message before calling _processRpcMessage. That may become a good spot to hook in for extensions
Yeah i think worth keeping in backlog, as its an option worth considering later still, maybe testing with test infra we have later and maybe short term option still depending on timelines of other paths/work
Also implementation note for later, if we made small pr to gossipsub to pull forwarding logic out into a _forwardRpcMessage function (really minor change there) then the code example above would be almost all the code needed to implement this.
Because the pinning nodes are not peered with each other, pubsub messages sent to one don't propagate to the others. An ipfs peer bootstrapping against the exposed hostname/ip is being directed by the load balancer to one of the pinning nodes. Because of this, it will only receive a subset of pin request messages. For the purposes of an external pinning node, we'd like to receive all pinning messages. To do this:
messageBroker
implementation)The eventual goal is to use libp2p messaging instead of redis, but this will provide similar functionality in the meantime while minimizing changes to the architecture.