get10101 / 10101

10101 (Ten-Ten-One): Self-custodial derivative trading at your fingertips.
https://10101.finance
MIT License
128 stars 23 forks source link

App backend unresponsive after open-closing LN channel multiple times #1187

Closed holzeis closed 1 year ago

holzeis commented 1 year ago

We are seeing quite a lot of ping timeouts to the app on the latest version on the coordinator 1.2.3 (1937). It appears there is some blocking task running on the phone that is responsible for those timeouts.

Testing the latest version on regtest 1.2.3 (1937) funding the wallet, opening or closing a position were randomly failing or taking a very long time. e.g. we observed that processing a SubChannelCloseFinalize took about 2 minutes on the phone for no apparent reason. (No logs in between processing SubChannelCloseConfirm message and sending Finalize message)

We presume the ping timeout is just one symptom of that issue.

https://itchysats.grafana.net/explore?orgId=1&panes=%7B%22U8i%22:%7B%22datasource%22:%22grafanacloud-logs%22,%22queries%22:%5B%7B%22refId%22:%22A%22,%22datasource%22:%7B%22type%22:%22loki%22,%22uid%22:%22grafanacloud-logs%22%7D,%22editorMode%22:%22builder%22,%22expr%22:%22%7Bcontainer%3D%5C%22coordinator%5C%22,%20setup%3D%5C%22get10101-regtest%5C%22%7D%20%7C%3D%20%60ping%20timeout%60%20%7C%20json%22,%22queryType%22:%22range%22%7D%5D,%22range%22:%7B%22from%22:%22now-1h%22,%22to%22:%22now%22%7D%7D%7D&schemaVersion=1

holzeis commented 1 year ago

Testing the app further it appears my app is constantly breaking eventually with the following error (Channel is being force-closed)

Closing channel 37c4107e553badff96c8ef74dd63f553bc968332e2318f0a72e73019aef1bfea due to close-required error: Invalid commitment tx signature from peer

Below are my app logs of independent runs from a clean state on my phone. Version 1.2.3 Build (1937)

https://itchysats.grafana.net/explore?orgId=1&panes=%7B%22U8i%22:%7B%22datasource%22:%22grafanacloud-logs%22,%22queries%22:%5B%7B%22refId%22:%22A%22,%22datasource%22:%7B%22type%22:%22loki%22,%22uid%22:%22grafanacloud-logs%22%7D,%22editorMode%22:%22builder%22,%22expr%22:%22%7Bcontainer%3D%5C%22coordinator%5C%22,%20setup%3D%5C%22get10101-regtest%5C%22%7D%20%7C%3D%20%600367d7e25797efd14ef341e786e5fcfd3bcf094bf57e0938abf3e5da65d9db0f21%60%20%7C%20json%22,%22queryType%22:%22range%22%7D%5D,%22range%22:%7B%22from%22:%22now-7d%22,%22to%22:%22now%22%7D%7D%7D&schemaVersion=1

klochowicz commented 1 year ago

I'll see whether I can reproduce it on a desktop build (it would do wonders for debuggability to have it on local machine).

bonomat commented 1 year ago

Some logs from my phone:

run 1:

| 2023-08-29 16:55:34.894 | LogLevel.DEBUG | New message from orderbook msg: {"Match":{"order_id":"8d91af60-5f28-41be-a017-4f79338031f6","expiry_timestamp":[2023,248,0,0,0,0,0,0,0],"oracle_pk":"16f88cf7d21e6c0f46bcbc983a4e3b19726c6c98858cc31c83551a88fde171c0","matches":[{"order_id":"c30db659-6907-4a23-bea0-2e8be570b5d4","quantity":10.0,"pubkey":"0222eccb3303dc5e8a13be107594716a11bfa9307f79e09ea769f06ead28e5de8d","execution_price":27273.0}]}} 
| 2023-08-29 16:55:34.895 | LogLevel.INFO | Received match from orderbook order_id: 8d91af60-5f28-41be-a017-4f79338031f6 
| 2023-08-29 16:55:34.896 | LogLevel.DEBUG | Filling order with id: 8d91af60-5f28-41be-a017-4f79338031f6 filled: FilledWith { order_id: 8d91af60-5f28-41be-a017-4f79338031f6, expiry_timestamp: 2023-09-05 0:00:00.0 +00:00:00, oracle_pk: XOnlyPublicKey(c071e1fd881a55831cc38c85986c6c72193b4e3a98bcbc460f6c1ed2f78cf8167458a18010faf718756c69de33ee9e38a0d3f0ad48befa58e23a4b1f843e08be), matches: [Match { order_id: c30db659-6907-4a23-bea0-2e8be570b5d4, quantity: 10, pubkey: PublicKey(8ddee528ad6ef069a79ee0797f30a9bf116a71947510be138a5edc0333cbec22ccd8aac6aaec56536866de75029639822a1dc33a41bb8598120b95ea018108e4), execution_price: 27273 }] },order: Order { id: 8d91af60-5f28-41be-a017-4f79338031f6, leverage: 1.0, quantity: 10.0, contract_symbol: BtcUsd, direction: Long, order_type: Market, state: Initial, creation_timestamp: 2023-08-29 14:55:33.0 +00:00:00, order_expiry_timestamp: 2023-08-29 14:56:33.0 +00:00:00 } 
| 2023-08-29 16:55:34.897 | LogLevel.INFO | Updated order state new_state: Filling,order_id: 8d91af60-5f28-41be-a017-4f79338031f6 
| 2023-08-29 16:55:34.897 | LogLevel.DEBUG | Publishing an internal event event: OrderUpdateNotification 
| 2023-08-29 16:55:34.898 | LogLevel.DEBUG | Forwarding event to flutter event: OrderUpdateNotification 
| 2023-08-29 16:55:35.113 | LogLevel.DEBUG | Ignoring latest state update candidate: Open,current_state: Filling 
| 2023-08-29 16:55:35.114 | LogLevel.DEBUG | Publishing an internal event event: OrderUpdateNotification 
| 2023-08-29 16:55:35.115 | LogLevel.DEBUG | Forwarding event to flutter event: OrderUpdateNotification 
| 2023-08-29 16:55:35.117 | LogLevel.DEBUG | Publishing an internal event event: PositionUpdateNotification 
| 2023-08-29 16:55:35.118 | LogLevel.DEBUG | Forwarding event to flutter event: PositionUpdateNotification 
| 2023-08-29 16:55:35.438 | LogLevel.INFO | Tracking channel status  
| 2023-08-29 16:55:35.439 | LogLevel.DEBUG | Getting all non-pending channels  
| 2023-08-29 16:55:35.440 | LogLevel.DEBUG | Got all non-pending channels channels: [Channel { user_channel_id: UserChannelId(14d35fb0-8186-4066-b5cf-a8934a9815bf), channel_id: Some([208, 97, 128, 123, 205, 106, 240, 56, 73, 96, 44, 83, 150, 62, 80, 176, 9, 34, 246, 234, 102, 137, 15, 198, 57, 25, 82, 40, 0, 220, 21, 147]), fake_scid: None, inbound: 7899000, outbound: 7218000, funding_txid: Some(9215dc0028521939c60f8966eaf62209b0503e96532c604938f06acd7b8061d0), channel_state: Open, counterparty: PublicKey(025837416dcd1520de64858477381e5f7c127899489284b7cf9565ae4d927b503380c813bff475d8fff4ba07f2502e5c14db46571898340e2a37d6f2bb1f4d1a), created_at: 2023-08-22 10:41:15.0 +00:00:00, updated_at: 2023-08-29 14:55:09.0 +00:00:00 }] 
| 2023-08-29 16:55:35.441 | LogLevel.INFO | Channel status udpate status: LnDlcOpen 
| 2023-08-29 16:55:35.442 | LogLevel.DEBUG | Publishing an internal event event: ChannelStatusUpdate 
| 2023-08-29 16:55:35.443 | LogLevel.DEBUG | Forwarding event to flutter event: ChannelStatusUpdate 
| 2023-08-29 16:55:35.443 | LogLevel.DEBUG | Received channel status update: Event.channelStatusUpdate(field0: ChannelStatus.LnDlcOpen) 
| 2023-08-29 16:55:35.601 | LogLevel.INFO | Sent trade request to coordinator successfully  
| 2023-08-29 16:55:35.604 | LogLevel.INFO | Registered order-matching fee invoice to be paid later payment_hash: e9b171a939b67e71aa481f15bcc46d060375a6647b50a054171036cc68ffe5fa 
| 2023-08-29 16:55:35.902 | LogLevel.INFO | Processing message from: 03507b924dae6595cfb78492489978127c5f1e3877848564de2015cd6d41375802,kind: SubChannelCloseOffer 
| 2023-08-29 16:55:35.918 | LogLevel.INFO | Accepting DLC channel collaborative settlement channel_id: d061807bcd6af03849602c53963e50b00922f6ea66890fc63919522800dc1593 
| 2023-08-29 16:55:35.919 | LogLevel.DEBUG | lightning::chain::channelmonitor: Getting signed latest holder commitment transaction!  
| 2023-08-29 16:55:35.920 | LogLevel.TRACE | lightning::ln::channel: Updating HTLC state for a newly-sent commitment_signed...  
| 2023-08-29 16:55:35.921 | LogLevel.TRACE | lightning::ln::channel: Building commitment transaction number 281474976710635 (really 20 xor 222251592485799) for channel d061807bcd6af03849602c53963e50b00922f6ea66890fc63919522800dc1593 for remote, generated by us with fee 253...  
| 2023-08-29 16:55:35.922 | LogLevel.TRACE | lightning::ln::channel:    ...including to_remote output with value 52018  
| 2023-08-29 16:55:35.922 | LogLevel.TRACE | lightning::ln::channel:    ...including to_local output with value 47799  
| 2023-08-29 16:55:35.923 | LogLevel.TRACE | lightning::chain::chainmonitor: Updating ChannelMonitor for channel d061807bcd6af03849602c53963e50b00922f6ea66890fc63919522800dc1593  
| 2023-08-29 16:55:35.923 | LogLevel.INFO | lightning::chain::channelmonitor: Applying update to monitor d061807bcd6af03849602c53963e50b00922f6ea66890fc63919522800dc1593, bringing update_id from 50 to 51 with 1 changes.  
| 2023-08-29 16:55:35.924 | LogLevel.TRACE | lightning::chain::channelmonitor: Updating ChannelMonitor with latest counterparty commitment transaction info  
| 2023-08-29 16:55:35.924 | LogLevel.TRACE | lightning::chain::channelmonitor: Tracking new counterparty commitment transaction with txid fe31f40bd8752b590c3361e09a0bf9ef8ad940beed577cc335a504ef9455aa9b at commitment number 281474976710635 with 0 HTLC outputs  
| 2023-08-29 16:55:35.926 | LogLevel.DEBUG | lightning::chain::chainmonitor: Persistence of ChannelMonitorUpdate for channel d061807bcd6af03849602c53963e50b00922f6ea66890fc63919522800dc1593 completed  
| 2023-08-29 16:55:35.926 | LogLevel.TRACE | lightning::ln::channel: Creating an announcement_signatures message for channel d061807bcd6af03849602c53963e50b00922f6ea66890fc63919522800dc1593  
| 2023-08-29 16:55:35.927 | LogLevel.TRACE | lightning::ln::channel: Ignore : Channel is not available for public announcements  
| 2023-08-29 16:55:35.927 | LogLevel.TRACE | lightning::ln::channel: Regenerated latest commitment update in channel d061807bcd6af03849602c53963e50b00922f6ea66890fc63919522800dc1593 with 0 update_adds, 0 update_fulfills, 0 update_fails, and 0 update_fail_malformeds  
| 2023-08-29 16:55:35.928 | LogLevel.TRACE | lightning::ln::channel: Building commitment transaction number 281474976710635 (really 20 xor 222251592485799) for channel d061807bcd6af03849602c53963e50b00922f6ea66890fc63919522800dc1593 for remote, generated by us with fee 253...  
| 2023-08-29 16:55:35.928 | LogLevel.TRACE | lightning::ln::channel:    ...including to_remote output with value 52018  
| 2023-08-29 16:55:35.928 | LogLevel.TRACE | lightning::ln::channel:    ...including to_local output with value 47799  
| 2023-08-29 16:55:35.929 | LogLevel.TRACE | lightning::ln::channel: Signed remote commitment tx 0200000001d061807bcd6af03849602c53963e50b00922f6ea66890fc63919522800dc15920100000000fb22ca8002b7ba0000000000001600143313c9db551d73c0cb7dc0806dc36da885643e0632cb000000000000220020b383afb34cf1d44c60685d6bdfa64d92383f8eb78c9ec5c6e5a8cdd205bb2864b38b3820 (txid fe31f40bd8752b590c3361e09a0bf9ef8ad940beed577cc335a504ef9455aa9b) with redeemscript 475221030760df4c7503689df798b9f9f5df6f1c7f216a79fd5b186830a7122415ac413b210378e4e35acf7733613ade318d70266a278142bbaf069c4388bb394e3eaaf519bb52ae with value 100000 -> 3c145ea4952f431c2cbffdb15f30140850f8797edb9f7c302d856b72e3f801f7069711bd226ce3c6a3061fb04e028cae5647662abdadd561172c77fe0a73aa76 in channel d061807bcd6af03849602c53963e50b00922f6ea66890fc63919522800dc1593  
| 2023-08-29 16:55:35.929 | LogLevel.DEBUG | lightning::ln::channel: Restored monitor updating in channel d061807bcd6af03849602c53963e50b00922f6ea66890fc63919522800dc1593 resulting in a commitment update and no RAA, with RAA first  
| 2023-08-29 16:55:35.929 | LogLevel.TRACE | lightning_background_processor: Persisting ChannelManager...  
| 2023-08-29 16:55:35.935 | LogLevel.TRACE | lightning_background_processor: Done persisting ChannelManager.  
| 2023-08-29 16:55:36.943 | LogLevel.INFO | Processing message from: 03507b924dae6595cfb78492489978127c5f1e3877848564de2015cd6d41375802,kind: SubChannelCloseConfirm 
| 2023-08-29 16:55:36.954 | LogLevel.TRACE | lightning::ln::channel: Updating HTLCs on receipt of RAA in channel d061807bcd6af03849602c53963e50b00922f6ea66890fc63919522800dc1593...  
| 2023-08-29 16:55:36.954 | LogLevel.DEBUG | lightning::ln::channel: Received a valid revoke_and_ack for channel d061807bcd6af03849602c53963e50b00922f6ea66890fc63919522800dc1593 with no reply necessary.  
| 2023-08-29 16:55:36.955 | LogLevel.TRACE | lightning::chain::chainmonitor: Updating ChannelMonitor for channel d061807bcd6af03849602c53963e50b00922f6ea66890fc63919522800dc1593  
| 2023-08-29 16:55:36.955 | LogLevel.INFO | lightning::chain::channelmonitor: Applying update to monitor d061807bcd6af03849602c53963e50b00922f6ea66890fc63919522800dc1593, bringing update_id from 51 to 52 with 1 changes.  
| 2023-08-29 16:55:36.956 | LogLevel.TRACE | lightning::chain::channelmonitor: Updating ChannelMonitor with commitment secret  
| 2023-08-29 16:55:36.963 | LogLevel.DEBUG | lightning::chain::chainmonitor: Persistence of ChannelMonitorUpdate for channel d061807bcd6af03849602c53963e50b00922f6ea66890fc63919522800dc1593 completed  
| 2023-08-29 16:55:36.963 | LogLevel.TRACE | lightning::ln::channel: Creating an announcement_signatures message for channel d061807bcd6af03849602c53963e50b00922f6ea66890fc63919522800dc1593  
| 2023-08-29 16:55:36.964 | LogLevel.TRACE | lightning::ln::channel: Ignore : Channel is not available for public announcements  
| 2023-08-29 16:55:36.964 | LogLevel.DEBUG | lightning::ln::channel: Restored monitor updating in channel d061807bcd6af03849602c53963e50b00922f6ea66890fc63919522800dc1593 resulting in no commitment update and no RAA, with RAA first  
| 2023-08-29 16:55:36.965 | LogLevel.TRACE | lightning::ln::channel: Building commitment transaction number 281474976710635 (really 20 xor 222251592485799) for channel d061807bcd6af03849602c53963e50b00922f6ea66890fc63919522800dc1593 for us, generated by remote with fee 253...  
| 2023-08-29 16:55:36.965 | LogLevel.TRACE | lightning::ln::channel:    ...including to_local output with value 47799  
| 2023-08-29 16:55:36.966 | LogLevel.TRACE | lightning::ln::channel:    ...including to_remote output with value 52018  
| 2023-08-29 16:55:36.966 | LogLevel.TRACE | lightning::ln::channel: Checking commitment tx signature 7d205d392dfa73f3e3e24610ff60c52cfc8180dbf84f4734f74e59223bd25a92198e894ba8d0d88b700eb5a19f050fe11c59d13ec3d44e302a2fcbf5ca381fa5 by key 0378e4e35acf7733613ade318d70266a278142bbaf069c4388bb394e3eaaf519bb against tx 0200000001d061807bcd6af03849602c53963e50b00922f6ea66890fc63919522800dc15920100000000fb22ca8002b7ba000000000000220020f7b95c1a644599f56a649c7120b9cdba9cd842713aeb5ffc935cd4e2f03dbbc232cb0000000000001600149ff4e7e91768303f8d73101d55ac408d24bfe877b38b3820 (sighash a49f3daf7a7fbb0a0744ca90bf37edd8565e8ef8fd83f8a75f7c71de17753b02) with redeemscript 475221030760df4c7503689df798b9f9f5df6f1c7f216a79fd5b186830a7122415ac413b210378e4e35acf7733613ade318d70266a278142bbaf069c4388bb394e3eaaf519bb52ae in channel d061807bcd6af03849602c53963e50b00922f6ea66890fc63919522800dc1593  
| 2023-08-29 16:55:36.966 | LogLevel.DEBUG | lightning::ln::channel: Received valid commitment_signed from peer in channel d061807bcd6af03849602c53963e50b00922f6ea66890fc63919522800dc1593, updating HTLC state and responding with a revoke_and_ack.  
| 2023-08-29 16:55:36.967 | LogLevel.TRACE | lightning::chain::chainmonitor: Updating ChannelMonitor for channel d061807bcd6af03849602c53963e50b00922f6ea66890fc63919522800dc1593  
| 2023-08-29 16:55:36.967 | LogLevel.INFO | lightning::chain::channelmonitor: Applying update to monitor d061807bcd6af03849602c53963e50b00922f6ea66890fc63919522800dc1593, bringing update_id from 52 to 53 with 1 changes.  
| 2023-08-29 16:55:36.968 | LogLevel.TRACE | lightning::chain::channelmonitor: Updating ChannelMonitor with latest holder commitment transaction info  
| 2023-08-29 16:55:36.969 | LogLevel.DEBUG | lightning::chain::chainmonitor: Persistence of ChannelMonitorUpdate for channel d061807bcd6af03849602c53963e50b00922f6ea66890fc63919522800dc1593 completed  
| 2023-08-29 16:55:36.969 | LogLevel.TRACE | lightning::ln::channel: Creating an announcement_signatures message for channel d061807bcd6af03849602c53963e50b00922f6ea66890fc63919522800dc1593  
| 2023-08-29 16:55:36.970 | LogLevel.TRACE | lightning::ln::channel: Ignore : Channel is not available for public announcements  
| 2023-08-29 16:55:36.970 | LogLevel.DEBUG | lightning::ln::channel: Restored monitor updating in channel d061807bcd6af03849602c53963e50b00922f6ea66890fc63919522800dc1593 resulting in no commitment update and an RAA, with commitment first  
| 2023-08-29 16:56:04.816 | LogLevel.INFO | Finished on-chain sync duration: 12533129 
| 2023-08-29 16:57:15.384 | LogLevel.INFO | Sending message kind: SubChannelCloseFinalize,to: 03507b924dae6595cfb78492489978127c5f1e3877848564de2015cd6d41375802 
| 2023-08-29 16:57:15.386 | LogLevel.TRACE | lightning_background_processor: Persisting ChannelManager...  
| 2023-08-29 16:57:15.387 | LogLevel.INFO | Updated order state new_state: Filled,order_id: 8d91af60-5f28-41be-a017-4f79338031f6 
| 2023-08-29 16:57:15.388 | LogLevel.DEBUG | Publishing an internal event event: OrderUpdateNotification 
| 2023-08-29 16:57:15.389 | LogLevel.DEBUG | Forwarding event to flutter event: OrderUpdateNotification 
| 2023-08-29 16:57:15.390 | LogLevel.DEBUG | Removing position after DLC channel closure filled_order: Order { id: 8d91af60-5f28-41be-a017-4f79338031f6, leverage: 1.0, quantity: 10.0, contract_symbol: BtcUsd, direction: Long, order_type: Market, state: Filled { execution_price: 27273.0 }, creation_timestamp: 2023-08-29 14:55:33.0 +00:00:00, order_expiry_timestamp: 2023-08-29 14:56:33.0 +00:00:00 } 

You can see that between sending SubChannelCloseFinalize there was a 1m10s period where nothing was happening. It appears there was a ping timeout at the coordinator but it's not clear what the phone was doing in this time window.

run 2:

| 2023-08-29 17:20:54.405 | LogLevel.ERROR | Failed to execute trade for order Some(3d914523-30d2-4724-aca0-6662bf1e6f1a): TimedOut: Order was not matched within Duration { seconds: 300, nanoseconds: 0 }  
| 2023-08-29 17:20:54.407 | LogLevel.INFO | Updated order state new_state: Failed,order_id: 3d914523-30d2-4724-aca0-6662bf1e6f1a 
klochowicz commented 1 year ago

@holzeis how did you run the app? was it the 10101-test or self-deployed with flutter run? something that's completely unrelated, but jumped at me in your logs was the line: | 2023-08-29 20:04:24.719 | LogLevel.INFO | Oracle endpoint: http://127.0.0.1:8081 ┤

this is likely not correct!

klochowicz commented 1 year ago

A bunch of findings:

I have created a branch where I added tokio metrics to see whether tokio is congested. I could see from measuruments before and after the force-close that the longest running task was 2s and then 6s. are these lengths enough to trigger such misbehaviour? 🤔

I added my file from my mobile, the relevant part is around 2023-08-29 17:24:58.800:

2023-08-29_182644.log

Of course we cannot rule out a deadlock on dedicated thread, I haven't got that far.

edit: this comment should rather be on #1194

holzeis commented 1 year ago

Closing for now, reopen if the issue reappears.

holzeis commented 1 year ago

Reopening, as it just happened to me on the latest build 1.2.3 (1965) that the app got stuck (see logs). This time the attempt to paying the JIT channel opening fees have been triggered after the channel has been updated.

https://nextcloud.holzeis.me/s/aRtN9LBDREBfHwL

What have I done?

I constantly opened a channel and closed it again. After the 8 or 9th attempt I got the app in a state where it wasn't responding anymore (resp. hanging).

bonomat commented 1 year ago

I constantly opened a channel and closed it again

Do you really mean opening and closing a channel? Or do you mean opening closing a sub-channel / position

holzeis commented 1 year ago

I constantly opened a channel and closed it again

Do you really mean opening and closing a channel? Or do you mean opening closing a sub-channel / position

I really mean opening and closing a lightning channel. rust-dlc is not involved at all.

holzeis commented 1 year ago

I am removing this ticket from the current iteration as the described scenario is not as critical anymore. Also this may get fixed with upgrading ldk to 0.1.116. Let's revisit this bug afterwards again.

github-actions[bot] commented 1 year ago

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.