Closed tsjk closed 8 months ago
During some discussion on Discord we concluded that the fault occurred at the responder. This is also supported by the fact that my node (the initiator) has the following settings in peerswap.conf
:
[Liquid]
rpchost="http://host.containers.internal/"
rpcport=17041
Thus, the cancelling reason should come from the other side (as the port in the reason is 7041).
This means that it ought have been the remote side that somehow failed to communicate with its elementsd, that it's interesting to hear what happened there, and that perhaps it can be explored whether retries on the responder's side would make a difference.
It also means that the initiator gets penalized for the responder's malfunction - but as I'm not very familiar with how the state machines involved actually work I don't know whether this is unfixable and a known risk or is a bug.
The responder peer is running on OpenBSD which is not as tested as Linux. It appears CLN Peerswap can't connect to elementsd some time after starting. Note the one time loss is as designed and ~the responder node should be added to your suspicious nodes list now~. Here are the logs on the responder side:
2024-03-03T08:32:48.754Z DEBUG plugin-peerswap: [Messenger] From: REDACTED got msgtype: a457 for s
wap: b86bc580ba8efb984dd6d1c1b90c880079d1dfd72e4729d0a7741412b2cb7a56
2024-03-03T08:32:48.983Z DEBUG plugin-peerswap: [FSM] event:id: b86bc580ba8efb984dd6d1c1b90c880079d1dfd72e4729d0a7741412b2cb7a56, Event_OnSwapOutRequestRece
ived on
2024-03-03T08:32:49.005Z DEBUG plugin-peerswap: [FSM] event:id: b86bc580ba8efb984dd6d1c1b90c880079d1dfd72e4729d0a7741412b2cb7a56, Event_ActionSucceeded on S
tate_SwapOutReceiver_CreateSwap
2024-03-03T08:32:49.005Z INFO plugin-peerswap: [Swap:b86bc580ba8efb984dd6d1c1b90c880079d1dfd72e4729d0a7741412b2cb7a56] swap-out request received: peer: REDACTED chanId: REDACTED initiator: REDACTED amount 2000000
2024-03-03T08:32:49.006Z DEBUG plugin-peerswap: [FSM] event:id: b86bc580ba8efb984dd6d1c1b90c880079d1dfd72e4729d0a7741412b2cb7a56, Event_ActionSucceeded on S
tate_SwapOutReceiver_SendFeeInvoice
2024-03-03T08:32:49.006Z DEBUG lightningd: Plugin peerswap returned from custommsg hook call
2024-03-03T08:32:55.521Z DEBUG plugin-peerswap: [FSM] event:id: b86bc580ba8efb984dd6d1c1b90c880079d1dfd72e4729d0a7741412b2cb7a56, Event_OnFeeInvoicePaid on
State_SwapOutReceiver_AwaitFeeInvoicePayment
2024-03-03T08:33:03.791Z INFO plugin-peerswap: block watcher: lbtc, Post \"http://127.0.0.1:7041/wallet/peerswap\": read tcp 127.0.0.1:24380-\u003e127.0.0.
1:7041: read: connection reset by peer
2024-03-03T08:33:03.792Z INFO plugin-peerswap: [FSM] Action failure Post \"http://127.0.0.1:7041/wallet/peerswap\": EOF
2024-03-03T08:33:03.792Z DEBUG plugin-peerswap: [FSM] event:id: b86bc580ba8efb984dd6d1c1b90c880079d1dfd72e4729d0a7741412b2cb7a56, Event_ActionFailed on Stat
e_SwapOutReceiver_BroadcastOpeningTx
2024-03-03T08:33:03.792Z DEBUG plugin-peerswap: [FSM] Canceling because of Post \"http://127.0.0.1:7041/wallet/peerswap\": EOF
2024-03-03T08:33:03.792Z DEBUG plugin-peerswap: [FSM] event:id: b86bc580ba8efb984dd6d1c1b90c880079d1dfd72e4729d0a7741412b2cb7a56, Event_ActionSucceeded on S
tate_SendCancel
2024-03-03T08:33:03.792Z INFO plugin-peerswap: [Swap:b86bc580ba8efb984dd6d1c1b90c880079d1dfd72e4729d0a7741412b2cb7a56] Swap canceled. Reason: Post \"http:/
/127.0.0.1:7041/wallet/peerswap\": EOF
$ lightning-cli peerswap-lbtc-getbalance
{
"code": -1,
"message": "Post \"http://127.0.0.1:7041/wallet/peerswap\": dial tcp 127.0.0.1:7041: connect: connection refused"
}
Actually it appears OpenBSD killed elementsd altogether:
$ elements-cli -getinfo
error: timeout on transient error: Could not connect to the server 127.0.0.1:7041
Make sure the daemon server is running and that you are connecting to the correct RPC port.
Closing this as peerswap is operating as intended.
I wanted to swap-out, and stumbled upon a fault that resulted in a minor loss of sats.
My side paid the swap-out prepayment, but then the process died. Seemingly because a communication failure. Elementsd says nothing, and there is no indication of the connection to it having been down. Do we need some kind of retry here? At least once of twice?