Open mortelli opened 5 years ago
Does it make sense to just drop a peer because the balance cannot be loaded / persisted? Such situation should put the entire failing node offline, as far as I understand it failing to load / persist balances may represent an I/O error and thus a situation that needs to be handled by the owner of the node.
Please review @zelig @homotopycolimit @nagydani
@diegomasini, @mortelli just was depicting the current state.
But as a matter of fact, if we have a corrupt database, or something, then how do we deal with it in a clean way? Probably we should not deal with the remote peer until the issue is fixed, or balances will diverge, resulting in some (possibly more painful) drop later. The simplest approach is to drop the peer right away.
Having said that, I am in favor of a better handling, but then we need to specify what it could be.
I agree on dropping the peer, but it should also trigger a mechanism to put the failing node offline. To avoid other inconsistency errors.
[...] Probably we should not deal with the remote peer until the issue is fixed, or balances will diverge, resulting in some (possibly more painful) drop later. The simplest approach is to drop the peer right away.
Keep in mind that even though the remote peer is the one that is dropped in this case, the failure to load or save a state indicates a problem in the local node, not the remote one.
[...] Probably we should not deal with the remote peer until the issue is fixed, or balances will diverge, resulting in some (possibly more painful) drop later. The simplest approach is to drop the peer right away.
Keep in mind that even though the remote peer is the one that is dropped in this case, the failure to load or save a state indicates a problem in the local node, not the remote one.
Exactly, that's my point.
discrepancies in perceived debt balance between two peers is not fatal, as long as the disagreement amount is small compared to the payment thresholds (or more specifically smaller than the difference between payment and disconnect thresholds).
but of course it is not a desirable state of affairs.
my gut tells me: don't be too eager to drop peers. avoid it whenever possible.
The question if you don't drop peers - what do you do? If we would just ignore peers, isn't that equivalent to just drop peers, because we would essentially not be dealing with them? If not dropped, we waste resources in maintaining the connection, the peer remains in the kademlia which marks it as a candidate for routing/syncing, but we block communication with it - I can't see the value of not dropping the peer so far.
discrepancies in perceived debt balance between two peers is not fatal, as long as the disagreement amount is small compared to the payment thresholds (or more specifically smaller than the difference between payment and disconnect thresholds).
but of course it is not a desirable state of affairs.
my gut tells me: don't be too eager to drop peers. avoid it whenever possible.
@homotopycolimit are you suggesting that when a threshold is reached, the settlement is not for the full amount hence allowing some discrepancies in perceived debt balance? Because if the settlement is for the full amount any discrepancy would likely cause disconnect.
I think so, yes.
Let's say that 80 is the threshold where you require payment. Assume also that you think your peer owes you 80 and they think they owe you 76. Now assume you ask for payment and they send you a cheque over 76, (or even 26 for that matter); why would you disconnect?
Ah so you are suggesting the nodes just ask "pay me what you think you owe me" rather than "pay me x amount which you owe me". Brilliant! @holisticode @mortelli @diegomasini.
But this is at a different level.
If you reach the payment threshold, the debtor issues a cheque. No disconnection takes place. If the creditor receives the cheque, the balance is restored to 0 (zero). If the cheque would not the full payment threshold value, the balance is restored to the difference (say 4). That is why the cheque has an Amount
field.
So when a node sends a ChequeRequestMsg
, it claims that a debtor reached payment threshold. Thus the debtor needs to check that:
Disconnection is something completely different and happens at the disconnect threshold, which is some allowance above the payment threshold, and happens if no cheque at all has been sent and thus the debtor continues consuming from the creditor until it reaches that threshold.
And most of the above discussion was about a corrupt database, which as of decision in meeting https://github.com/ethersphere/user-stories/blob/master/doc/incentives/sync-2019-06-25.md is a fatal error, and thus should result in the local node going offline (and a remote peer drop first).
If we would just ignore peers, isn't that equivalent to just drop peers, because we would essentially not be dealing with them? If not dropped, we waste resources in maintaining the connection, the peer remains in the kademlia which marks it as a candidate for routing/syncing, but we block communication with it
I never said 'ignore them'. I suggest refusing to serve retrieval requests coming from the offending node, other traffic is still allowable. In particular if you want to get data from them and they serve it to you (thus reducing the debt) that's fine. You only disallow traffic that would further increase the debt.
Disconnection is fine in lower Kademlia bins but gets problematic in the most proximate bin. If we can maintain connectivity there at the cost of blocking certain types of request, then that's probably preferable to a disconnect. The problem is that too many disconnects and blacklistings in the most proximate nbhd breaks our routing assumptions.
I never said 'ignore them'. I suggest refusing to serve retrieval requests coming from the offending node, other traffic is still allowable. In particular if you want to get data from them and they serve it to you (thus reducing the debt) that's fine. You only disallow traffic that would further increase the debt.
This suggestion makes much sense. You say though "I suggest" - can you formalize this into a requirement? I assume this implies getting consensus and approval from the research track, in possibly written form. How do you guys handle that?
@mortelli I suggest that in order to implement this, we would have to create a custom error, which is raised if the "disconnect threshold" would be crossed, then when both Send
and Receive
return (p2p/protocols/protocol.go
), we check against that error and just return nil
if that error occurred (the message would not be sent). We would have to log this with high priority (I suggest WARN, or does it need to be even ERROR?) so that it becomes visible that this is happening.
Ah so you are suggesting the nodes just ask "pay me what you think you owe me" rather than "pay me x amount which you owe me". Brilliant! @holisticode @mortelli @diegomasini.
we cannot be so liberal, but indeed we should find a way to correct minor discrepancies resulting from sent but not received messages
we cannot be so liberal,
sure we can. after all, the request is really just "pay me some amount to get us back closer to zero balance (under the threshold)".
If nodes always pay the full amount they think they owe, and if the discrepancy came about due to random errors, then we can assume they will even out over time. It is only systematic errors that will accrue over time to the point where they become real obstacles.
I never said 'ignore them'. I suggest refusing to serve retrieval requests coming from the offending node, other traffic is still allowable. In particular if you want to get data from them and they serve it to you (thus reducing the debt) that's fine. You only disallow traffic that would further increase the debt.
I think we're not properly dealing with this scenario based on the the current flow. We could have this situation:
In summary, while A owes B, B will not be sending or receiving any messages unless a payment is made, even if this message would reduce the incurred debt.
What if we ignored the balance threshold when the message price is negative?
@mortelli I suggest that in order to implement this, we would have to create a custom error, which is raised if the "disconnect threshold" would be crossed, then when both
Send
andReceive
return (p2p/protocols/protocol.go
), we check against that error and justreturn nil
if that error occurred (the message would not be sent). We would have to log this with high priority (I suggest WARN, or does it need to be even ERROR?) so that it becomes visible that this is happening.
I like this. We could use it to make a distinction between errors raised during the accounting process and other errors–maybe this could be a way of dropping chunk messages but allowing the rest.
We could drop the messages for these custom errors, but:
Here are some potentially problematic situations. Please review if possible.
B
owes node A
Suppose nodes A
and B
have just gone over the disconnect threshold, due to a chunk being sent from A
to B
, resulting in the following balances:
nodeA.balances[nodeB] > disconnectThreshold
nodeB.balances[nodeA] = nodeA.balances[nodeB] * -1
Assume no payments were or will be made.
B
wants to request another chunk from node A
B
will send a Retrieve Request Message to A
for the chunk.B
to A
is negative at this point, it cannot be higher than the disconnect threshold.B
will increase its debt to A
. The message will then go through B
's Send
method.A
will receive the Retrieve Request Message from B
for the chunk.A
to B
is positive and over the disconnect threshold, A
will drop B
as a peer and skip the message accounting and handling. It will not go through A
's Receive
method.B
accounted for a message that A
did not.A
wants to request a chunk from node B
A
has not dropped B
yet, or has added it as a peer again after doing so.A
will send a Retrieve Request Message to B
for the chunk.A
to B
is positive and over the disconnect threshold, A
will drop B
as a peer and skip the message accounting and sending. It will not go through A
's Send
method.A
will not be able to request a chunk from B
, even though doing so would reduce B
's debt to A
.Thoughts?
Following up on this one:
Here are some potentially problematic situations.
Starting point: Node
B
owes nodeA
Suppose nodes
A
andB
have just gone over the disconnect threshold, due to a chunk being sent fromA
toB
, resulting in the following balances:nodeA.balances[nodeB] > disconnectThreshold nodeB.balances[nodeA] = nodeA.balances[nodeB] * -1
Assume no payments were or will be made.
We now have a PR that will cover this scenario:
Situation 2: node
A
wants to request a chunk from nodeB
Suppose
A
has not droppedB
yet, or has added it as a peer again after doing so.
A
will send a Retrieve Request Message toB
for the chunk.This type of message is payed by the sender.
Since the balance from
A
toB
is positive and over the disconnect threshold,A
will dropB
as a peer and skip the message accounting and sending. It will not go throughA
'sSend
method.As a result,
A
will not be able to request a chunk fromB
, even though doing so would reduceB
's debt toA
.Proposal: when accounting, ignore the disconnect threshold if the message price is negative. This would mean that when the balance is positive, we'd be letting through messages which would reduce the debt.
This one should be solved through #1922 (which closes #1921).
Hello all,
After discussion with @holisticode, @vojtechsimetka, @diegomasini and others I thought it would be a good idea to document the issue of SWAP thresholds (see #1440) and open up the discussion to everyone.
I'm attaching a flowchart which lays out the flow for message processing considering both the payment and disconnect thresholds, as we understand and are building them today.
Diagram
Notes
peer.Send
,peer.handleIncoming
is called.accounting.Send
,accounting.Receive
is called.Discussion points
Based on this flow, it follows that: