ElementsProject / peerswap

MIT License
109 stars 40 forks source link

gettxout augmentation PR #192 breaks state machine? fsm config invalid #200

Closed wtogami closed 1 year ago

wtogami commented 1 year ago

fsm config invalid

These logs below are from a node requesting a lbtc swap of 1 million sats from remote node. Everything but the Liquid txid is redacted.

Both nodes are running CLN v23.05.1 and PeerSwap #192.

2023-06-18T01:23:13.681Z DEBUG   plugin-peerswap: [FSM] event:id: SWAPID, Event_OnSwapOutStarted on 
2023-06-18T01:23:13.681Z INFO    plugin-peerswap: [Swap:SWAPID] Start new swap-out: peer: PEERID chanId: CHANID initiator: MYNODEID amount 1000000
2023-06-18T01:23:13.722Z DEBUG   plugin-peerswap: [FSM] event:id: SWAPID, Event_ActionSucceeded on State_SwapOutSender_CreateSwap
2023-06-18T01:23:13.764Z DEBUG   plugin-peerswap: [FSM] event:id: SWAPID, Event_ActionSucceeded on State_SwapOutSender_SendRequest
2023-06-18T01:23:14.024Z DEBUG   plugin-peerswap: [Messenger] From: PEERID got msgtype: a45b payload: {\"protocol_version\":3,\"swap_id\":\"SWAPID\",\"pubkey\":\"PEERID\",\"Payreq\":\"BOLT11PREPAYMENTINVOICE\"}
2023-06-18T01:23:14.064Z DEBUG   plugin-peerswap: [FSM] event:id: SWAPID, Event_OnFeeInvoiceReceived on State_SwapOutSender_AwaitAgreement
2023-06-18T01:23:14.067Z INFO    lightningd: Sending 339000msat over 1 hops to deliver 339000msat
2023-06-18T01:23:16.901Z DEBUG   plugin-peerswap: [FSM] event:id: SWAPID, Event_ActionSucceeded on State_SwapOutSender_PayFeeInvoice
2023-06-18T01:23:16.902Z INFO    plugin-peerswap: [Swap:SWAPID] Paid Feeinvoice of 339 sats
2023-06-18T01:23:16.902Z DEBUG   plugin-peerswap: [Messenger] From: PEERID got msgtype: a45d payload: {\"swap_id\":\"SWAPID\",\"payreq\":\"BOLT11BALANCE\",\"tx_id\":\"24f9ac8ee136ef2263818e0d30e353dbdeec2d39af7c31ed70e1c28a7ad2a215\",\"script_out\":0,\"blinding_key\":\"BLINDINGKEY\"}
2023-06-18T01:23:16.980Z DEBUG   plugin-peerswap: [FSM] event:id: SWAPID, Event_OnTxOpenedMessage on State_SwapOutSender_AwaitTxBroadcastedMessage
2023-06-18T01:23:16.982Z DEBUG   plugin-peerswap: Await confirmation for tx with id: 24f9ac8ee136ef2263818e0d30e353dbdeec2d39af7c31ed70e1c28a7ad2a215 on swap SWAPID
2023-06-18T01:23:16.982Z DEBUG   plugin-peerswap: [TxWatcher] SWAPID wait for confirmation, current state: confirmed_spent
2023-06-18T01:23:17.022Z DEBUG   plugin-peerswap: [FSM] event:id: SWAPID, Event_AlreadyClaimed on State_SwapOutSender_AwaitTxConfirmation
2023-06-18T01:23:17.022Z DEBUG   plugin-peerswap: 
2023-06-18T01:23:17.022Z DEBUG   plugin-peerswap:  msghandler err: fsm config invalid
2023-06-18T01:23:26.866Z DEBUG   plugin-peerswap: [Messenger] From: PEERID got msgtype: a45d payload: {\"swap_id\":\"SWAPID\",\"payreq\":\"BOLT11BALANCE\",\"tx_id\":\"24f9ac8ee136ef2263818e0d30e353dbdeec2d39af7c31ed70e1c28a7ad2a215\",\"script_out\":0,\"blinding_key\":\"BLINDINGKEY\"}
2023-06-18T01:23:26.866Z DEBUG   plugin-peerswap: 
2023-06-18T01:23:26.866Z DEBUG   plugin-peerswap:  msghandler err: Message already exists
### The following repeats forever despite the transaction confirmation
2023-06-18T01:23:36.650Z DEBUG   plugin-peerswap: [Messenger] From: PEERID got msgtype: a45d payload: {\"swap_id\":\"SWAPID\",\"payreq\":\"BOLT11BALANCE\",\"tx_id\":\"24f9ac8ee136ef2263818e0d30e353dbdeec2d39af7c31ed70e1c28a7ad2a215\",\"script_out\":0,\"blinding_key\":\"BLINDINGKEY\"}
2023-06-18T01:23:36.650Z DEBUG   plugin-peerswap: 
2023-06-18T01:23:36.650Z DEBUG   plugin-peerswap:  msghandler err: Message already exists

Not sure what happened. The state machine is confused.

It looks like the requesting local node paid the swap-out BOLT11 invoice then the onchain transaction confirmed so fast that PeerSwap was unable to see it was unpaid prior to confirmation. This might be a rare race condition that confused the state machine?

      {
         "id": "SWAPID",
         "created_at": 1687051393,
         "asset": "lbtc",
         "type": "swap-out",
         "role": "sender",
         "state": "State_SwapOutSender_AwaitTxConfirmation",
         "initiator_node_id": "MYNODEID",
         "peer_node_id": "PEERID",
         "amount": 1000000,
         "channel_id": "CHANID",
         "opening_tx_id": "24f9ac8ee136ef2263818e0d30e353dbdeec2d39af7c31ed70e1c28a7ad2a215",
         "lnd_chan_id": LNDCHANID
      }

Local state 40 minutes later. It's stuck forever like this even beyond the expiration time.

The other node is stuck in state State_SwapOutReceiver_AwaitClaimInvoicePayment.

Later I thought maybe if i restart the local node's peerswap it might recover and recognize the tx was confirmed and complete the swap.

2023-06-18T02:27:00.650Z INFO    plugin-peerswap: [Swap:SWAPID]: Recovering from State_SwapOutSender_AwaitTxConfirmation
2023-06-18T02:27:00.786Z DEBUG   plugin-peerswap: [FSM] event:id: SWAPID, Event_ActionFailed on State_SwapOutSender_AwaitTxConfirmation
2023-06-18T02:27:00.882Z DEBUG   plugin-peerswap: [FSM] event:id: SWAPID, Event_ActionSucceeded on State_SwapOutSender_SendPrivkey
2023-06-18T02:27:00.944Z DEBUG   plugin-peerswap: [FSM] event:id: SWAPID, Event_ActionSucceeded on State_SwapOutSender_SendCoopClose
2023-06-18T02:27:00.944Z INFO    plugin-peerswap: [Swap:SWAPID] Swap claimed cooperatively
2023-06-18T02:27:00.977Z INFO    plugin-peerswap: peerswap initialized

I was too late. At least it successfully recovered.

Questions

wtogami commented 1 year ago

NOPE my interpretation was wrong. It happened again immediately in the next swap attempt. Maybe we never tested it with 192 on both sides before? Reverting 192 for now. Needs further investigation.

wtogami commented 1 year ago

Interesting. This time I waited for 2+ confirmations then restarted peerswap on the local node prior to the 50% invoice expiration time. It recognized the onchain confirmed transaction and completed the swap immediately.

2023-06-18T02:44:35.942Z INFO    plugin-peerswap: [Swap:SWAPID]: Recovering from State_SwapOutSender_AwaitTxConfirmation
2023-06-18T02:44:35.968Z DEBUG   plugin-peerswap: Await confirmation for tx with id: ecb3b0b9cf97bfe591015ffc237dc842761242cd5c8b33428d38f72a54bf6c7f on swap SWAPID
2023-06-18T02:44:35.969Z DEBUG   plugin-peerswap: [TxWatcher] SWAPID wait for confirmation, current state: confirmed_below
2023-06-18T02:44:36.040Z INFO    plugin-peerswap: peerswap initialized
2023-06-18T02:44:36.099Z DEBUG   plugin-peerswap: [FSM] event:id: SWAPID, Event_OnTxConfirmed on State_SwapOutSender_AwaitTxConfirmation
2023-06-18T02:44:42.370Z DEBUG   plugin-peerswap: [Messenger] From: PEERID got msgtype: a45d payload: {\"swap_id\":\"SWAPID\",\"payreq\":\"BOLT11BALANCE\",\"tx_id\":\"ecb3b0b9cf97bfe591015ffc237dc842761242cd5c8b33428d38f72a54bf6c7f\",\"script_out\":1,\"blinding_key\":\"BLINDING_KEY\"}
2023-06-18T02:44:46.167Z INFO    lightningd: Sending 1500000000msat over 1 hops to deliver 1500000000msat
2023-06-18T02:44:49.592Z DEBUG   plugin-peerswap: [FSM] event:id: SWAPID, Event_ActionSucceeded on State_SwapOutSender_ValidateTxAndPayClaimInvoice
2023-06-18T02:44:50.122Z DEBUG   plugin-peerswap: [FSM] event:id: SWAPID, Event_ActionSucceeded on State_SwapOutSender_ClaimSwap
nepet commented 1 year ago

I guess I got what is happening here. We use gettxout to check if the tx is already confirmed or if the tx passed the csv limit. We handled gettxout as if the tx output was already spent which is true but the return is also empty if we don't have the tx in our mempool nor on the blockchain. I have a patch ready that is less strict on empty gettxout returns and only returns an ErrIsSpent if we try to recover from a swap. On "normal" operations we ignore the empty return, just log it and keep watching for the tx .

nepet commented 1 year ago

Maybe we want to have a timestamp in the db when the swap started and be forgiving if the time t is t < timestamp + xand be strict if the time is t >= timestamp + x This way we would not even have to tell the tx watcher if we are recovering and could also recover from a restart shortly after we broadcasted the opening tx.

wtogami commented 1 year ago

We figured out what happened here. #192 intended to add more safety checks but we ended up adding a bug. This isn't a high priority so we will return to this a bit later.

TODO: testmempoolaccept augmented gettxout would make the txwatcher more precise.

wtogami commented 1 year ago

tl;dr: This is a low priority.

We thought we had a problem in the way we currently check for spent UTXO. #192 attempted to make our check stricter but in the process broke it because the tester uses the same mempool for sender and receiver (does not match reality). Potential improvement for later would be to add a watch-only wallet to bitcoind so we can more precisely track the quantity of confirmations after the onchain tx disappears from the mempool. But ultimately we found the current method we use actually works well enough.

wtogami commented 1 year ago

Closing for now