Connection to elementsd lost during swap (minor loss of funds)

tsjk commented 8 months ago

I wanted to swap-out, and stumbled upon a fault that resulted in a minor loss of sats.

2024-03-03T08:32:44.392Z INFO    lightningd: Sending 2000000000msat over 1 hops to deliver 2000000000msat
2024-03-03T08:32:48.528Z INFO    <remote_node_id-chan#seq>: htlc 3805 failed from 0th node with code 0x400f (WIRE_INCORRECT_OR_UNKNOWN_PAYMENT_DETAILS)
2024-03-03T08:32:48.560Z DEBUG   plugin-peerswap: [FSM] event:id: <event_id>, Event_OnSwapOutStarted on
2024-03-03T08:32:48.560Z INFO    plugin-peerswap: [Swap:<event_id>] Start new swap-out: peer: <remote_node_id> chanId: <scid> initiator: <local_node_id> amount 2000000
2024-03-03T08:32:48.563Z DEBUG   plugin-peerswap: [FSM] event:id: <event_id>, Event_ActionSucceeded on State_SwapOutSender_CreateSwap
2024-03-03T08:32:48.569Z DEBUG   plugin-peerswap: [FSM] event:id: <event_id>, Event_ActionSucceeded on State_SwapOutSender_SendRequest
2024-03-03T08:32:49.178Z DEBUG   plugin-peerswap: [Messenger] From: <remote_node_id> got msgtype: a45b for swap: <event_id>
2024-03-03T08:32:49.183Z DEBUG   plugin-peerswap: [FSM] event:id: <event_id>, Event_OnFeeInvoiceReceived on State_SwapOutSender_AwaitAgreement
2024-03-03T08:32:49.216Z INFO    lightningd: Sending 2000000000msat over 1 hops to deliver 2000000000msat
2024-03-03T08:32:52.911Z INFO    <remote_node_id-chan#seq>: htlc 3806 failed from 0th node with code 0x400f (WIRE_INCORRECT_OR_UNKNOWN_PAYMENT_DETAILS)
2024-03-03T08:32:53.001Z INFO    lightningd: Sending 306000msat over 1 hops to deliver 306000msat
2024-03-03T08:32:55.942Z DEBUG   plugin-peerswap: [FSM] event:id: <event_id>, Event_ActionSucceeded on State_SwapOutSender_PayFeeInvoice
2024-03-03T08:32:56.019Z INFO    plugin-peerswap: [Swap:<event_id>] Paid Feeinvoice of 306 sats
2024-03-03T08:33:03.963Z DEBUG   plugin-peerswap: [Messenger] From: <remote_node_id> got msgtype: a45f for swap: <event_id>
2024-03-03T08:33:03.967Z DEBUG   plugin-peerswap: [FSM] event:id: <event_id>, Event_OnCancelReceived on State_SwapOutSender_AwaitTxBroadcastedMessage
2024-03-03T08:33:03.967Z INFO    plugin-peerswap: [Swap:<event_id>] Swap canceled. Reason: Post \"http://127.0.0.1:7041/wallet/peerswap\": EOF
2024-03-03T08:33:03.967Z INFO    plugin-peerswap: [Swap:<event_id>] Warning: Paid swap-out prepayment, but swap canceled before receiving opening transaction

My side paid the swap-out prepayment, but then the process died. Seemingly because a communication failure. Elementsd says nothing, and there is no indication of the connection to it having been down. Do we need some kind of retry here? At least once of twice?

tsjk commented 8 months ago

I run https://github.com/ElementsProject/peerswap/commit/aef7098f9e57d4490db9799a1680baa1f60715d3, by the way.

tsjk commented 8 months ago

During some discussion on Discord we concluded that the fault occurred at the responder. This is also supported by the fact that my node (the initiator) has the following settings in peerswap.conf:

[Liquid]
rpchost="http://host.containers.internal/"
rpcport=17041

Thus, the cancelling reason should come from the other side (as the port in the reason is 7041).

This means that it ought have been the remote side that somehow failed to communicate with its elementsd, that it's interesting to hear what happened there, and that perhaps it can be explored whether retries on the responder's side would make a difference.

It also means that the initiator gets penalized for the responder's malfunction - but as I'm not very familiar with how the state machines involved actually work I don't know whether this is unfixable and a known risk or is a bug.

grubles commented 8 months ago

The responder peer is running on OpenBSD which is not as tested as Linux. It appears CLN Peerswap can't connect to elementsd some time after starting. Note the one time loss is as designed and ~the responder node should be added to your suspicious nodes list now~. Here are the logs on the responder side:

2024-03-03T08:32:48.754Z DEBUG   plugin-peerswap: [Messenger] From: REDACTED got msgtype: a457 for s
wap: b86bc580ba8efb984dd6d1c1b90c880079d1dfd72e4729d0a7741412b2cb7a56                                                                                         
2024-03-03T08:32:48.983Z DEBUG   plugin-peerswap: [FSM] event:id: b86bc580ba8efb984dd6d1c1b90c880079d1dfd72e4729d0a7741412b2cb7a56, Event_OnSwapOutRequestRece
ived on                                                                                                                                                       
2024-03-03T08:32:49.005Z DEBUG   plugin-peerswap: [FSM] event:id: b86bc580ba8efb984dd6d1c1b90c880079d1dfd72e4729d0a7741412b2cb7a56, Event_ActionSucceeded on S
tate_SwapOutReceiver_CreateSwap                                                                                                                               
2024-03-03T08:32:49.005Z INFO    plugin-peerswap: [Swap:b86bc580ba8efb984dd6d1c1b90c880079d1dfd72e4729d0a7741412b2cb7a56] swap-out request received: peer: REDACTED chanId: REDACTED initiator: REDACTED amount 2000000                                                                                                                                           
2024-03-03T08:32:49.006Z DEBUG   plugin-peerswap: [FSM] event:id: b86bc580ba8efb984dd6d1c1b90c880079d1dfd72e4729d0a7741412b2cb7a56, Event_ActionSucceeded on S
tate_SwapOutReceiver_SendFeeInvoice                                                                                                                           
2024-03-03T08:32:49.006Z DEBUG   lightningd: Plugin peerswap returned from custommsg hook call
2024-03-03T08:32:55.521Z DEBUG   plugin-peerswap: [FSM] event:id: b86bc580ba8efb984dd6d1c1b90c880079d1dfd72e4729d0a7741412b2cb7a56, Event_OnFeeInvoicePaid on 
State_SwapOutReceiver_AwaitFeeInvoicePayment                                                                                                                  
2024-03-03T08:33:03.791Z INFO    plugin-peerswap: block watcher: lbtc, Post \"http://127.0.0.1:7041/wallet/peerswap\": read tcp 127.0.0.1:24380-\u003e127.0.0.
1:7041: read: connection reset by peer
2024-03-03T08:33:03.792Z INFO    plugin-peerswap: [FSM] Action failure Post \"http://127.0.0.1:7041/wallet/peerswap\": EOF
2024-03-03T08:33:03.792Z DEBUG   plugin-peerswap: [FSM] event:id: b86bc580ba8efb984dd6d1c1b90c880079d1dfd72e4729d0a7741412b2cb7a56, Event_ActionFailed on Stat
e_SwapOutReceiver_BroadcastOpeningTx
2024-03-03T08:33:03.792Z DEBUG   plugin-peerswap: [FSM] Canceling because of Post \"http://127.0.0.1:7041/wallet/peerswap\": EOF
2024-03-03T08:33:03.792Z DEBUG   plugin-peerswap: [FSM] event:id: b86bc580ba8efb984dd6d1c1b90c880079d1dfd72e4729d0a7741412b2cb7a56, Event_ActionSucceeded on S
tate_SendCancel
2024-03-03T08:33:03.792Z INFO    plugin-peerswap: [Swap:b86bc580ba8efb984dd6d1c1b90c880079d1dfd72e4729d0a7741412b2cb7a56] Swap canceled. Reason: Post \"http:/
/127.0.0.1:7041/wallet/peerswap\": EOF

$ lightning-cli peerswap-lbtc-getbalance
{
   "code": -1,
   "message": "Post \"http://127.0.0.1:7041/wallet/peerswap\": dial tcp 127.0.0.1:7041: connect: connection refused"
}

grubles commented 8 months ago

Actually it appears OpenBSD killed elementsd altogether:

$ elements-cli -getinfo                                                      
error: timeout on transient error: Could not connect to the server 127.0.0.1:7041

Make sure the daemon server is running and that you are connecting to the correct RPC port.

grubles commented 8 months ago

Closing this as peerswap is operating as intended.

ElementsProject / peerswap

Connection to elementsd lost during swap (minor loss of funds) #287