ElementsProject / lightning

Core Lightning — Lightning Network implementation focusing on spec compliance and performance
Other
2.84k stars 901 forks source link

Backup plugin broken by lightning update #4348

Closed laanwj closed 3 years ago

laanwj commented 3 years ago

Somewhere between commit 990623676938adeff66488b43fbd03c1888624ad and 015ac37d924e31bb87b8597da9f863e82270657b , the backup plugin started causing a hang in lightningd at startup.

I tested this with the vanilla plugin, without my socket pipe changes.

I load it with

./lightningd/lightningd \
…
    --important-plugin $HOME/src/plugins/backup/backup.py \

Lightningd gets as far as connecting to a few peers but hangs. before it starts processing RPC commands (e.g. lightning-cli listpeers just hangs). Without the plugin it continues without problems.

I tried to figure out where it hangs by adding logging statements to the plugin, but it doesn't seem to hang in the plugin itself, at least not during the db_write hook nor during plugin initialization.

I will try to bisect this. Unfortunately, a database schema upgrade happened in between (4dfbee47f7c9bdf3ef1aa5fe84255514e924af2b) so I cannot go back to before that. It looks like it is already broken at that commit.

System: FreeBSD 12.2 Compiler: FreeBSD clang version 8.0.1

Edit: it's possibly unrelated to any specific version update here, but a chicken/egg issue with an early database update before RPC responds.

laanwj commented 3 years ago

Not sure this is useful, but I found out with gdb where it hangs:

(gdb) bt
#0  0x00000008007fc12a in _poll () from /lib/libc.so.7
#1  0x0000000800aa7246 in ?? () from /lib/libthr.so.3
#2  0x000000000030391e in io_loop (timers=0x0, expired=0x0) at ccan/ccan/io/poll.c:412
#3  0x00000000002aa612 in plugins_exclusive_loop (plugins=0x804bcf2b8) at lightningd/plugin.c:1857
#4  0x00000000002ad358 in plugin_hook_db_sync (db=0x800ae4e28) at lightningd/plugin_hook.c:397
#5  0x00000000002b2353 in db_report_changes (db=0x800ae4e28, final=<optimized out>, min=0) at wallet/db.c:857
#6  0x00000000002b2256 in db_commit_transaction (db=0x800ae4e28) at wallet/db.c:927
#7  0x0000000000292456 in main (argc=16, argv=<optimized out>) at lightningd/lightningd.c:972

After ctrl-C

2021-01-28T21:53:57.138Z UNUSUAL plugin-backup.py: Traceback (most recent call last):
2021-01-28T21:53:57.139Z UNUSUAL plugin-backup.py:   File \"/home/user/src/plugins/backup/backup.py\", line 485, in <module>
2021-01-28T21:53:57.139Z UNUSUAL plugin-backup.py:     plugin.run()
2021-01-28T21:53:57.139Z UNUSUAL plugin-backup.py:   File \"/home/user/.local/lib/python3.7/site-packages/pyln/client/plugin.py\", line 687, in run
2021-01-28T21:53:57.139Z UNUSUAL plugin-backup.py:     partial = self._multi_dispatch(msgs)
2021-01-28T21:53:57.139Z UNUSUAL plugin-backup.py:   File \"/home/user/.local/lib/python3.7/site-packages/pyln/client/plugin.py\", line 672, in _multi_dispatch
2021-01-28T21:53:57.139Z UNUSUAL plugin-backup.py:     self._dispatch_request(request)
2021-01-28T21:53:57.139Z UNUSUAL plugin-backup.py:   File \"/home/user/.local/lib/python3.7/site-packages/pyln/client/plugin.py\", line 564, in _dispatch_request
2021-01-28T21:53:57.139Z UNUSUAL plugin-backup.py:     result = self._exec_func(method.func, request)
2021-01-28T21:53:57.139Z UNUSUAL plugin-backup.py:   File \"/home/user/.local/lib/python3.7/site-packages/pyln/client/plugin.py\", line 549, in _exec_func
2021-01-28T21:53:57.139Z UNUSUAL plugin-backup.py:     return func(*ba.args, **ba.kwargs)
2021-01-28T21:53:57.140Z UNUSUAL plugin-backup.py:   File \"/home/user/.local/lib/python3.7/site-packages/pyln/client/plugin.py\", line 796, in _init
2021-01-28T21:53:57.140Z UNUSUAL plugin-backup.py:     return self._exec_func(self.child_init, request)
2021-01-28T21:53:57.140Z UNUSUAL plugin-backup.py:   File \"/home/user/.local/lib/python3.7/site-packages/pyln/client/plugin.py\", line 549, in _exec_func
2021-01-28T21:53:57.140Z UNUSUAL plugin-backup.py:     return func(*ba.args, **ba.kwargs)
2021-01-28T21:53:57.140Z UNUSUAL plugin-backup.py:   File \"/home/user/src/plugins/backup/backup.py\", line 446, in on_init
2021-01-28T21:53:57.140Z UNUSUAL plugin-backup.py:     configs = plugin.rpc.listconfigs()
2021-01-28T21:53:57.140Z UNUSUAL plugin-backup.py:   File \"/home/user/.local/lib/python3.7/site-packages/pyln/client/lightning.py\", line 837, in listconfigs
2021-01-28T21:53:57.140Z UNUSUAL plugin-backup.py:     return self.call(\"listconfigs\", payload)
2021-01-28T21:53:57.141Z UNUSUAL plugin-backup.py:   File \"/home/user/.local/lib/python3.7/site-packages/pyln/client/lightning.py\", line 329, in call
2021-01-28T21:53:57.141Z UNUSUAL plugin-backup.py:     resp, buf = self._readobj(sock, buf)
2021-01-28T21:53:57.141Z UNUSUAL plugin-backup.py:   File \"/home/user/.local/lib/python3.7/site-packages/pyln/client/lightning.py\", line 282, in _readobj
2021-01-28T21:53:57.141Z UNUSUAL plugin-backup.py:     b = sock.recv(max(1024, len(buff)))
2021-01-28T21:53:57.142Z UNUSUAL plugin-backup.py:   File \"/home/user/.local/lib/python3.7/site-packages/pyln/client/lightning.py\", line 256, in recv
2021-01-28T21:53:57.142Z UNUSUAL plugin-backup.py:     return self.sock.recv(length)
2021-01-28T21:53:57.142Z UNUSUAL plugin-backup.py: KeyboardInterrupt
2021-01-28T21:53:57.142Z INFO    plugin-backup.py: Killing plugin: Plugin exited before completing handshake.
2021-01-28T21:53:57.142Z **BROKEN** plugin-backup.py: Plugin marked as important, shutting down lightningd!

So it is hanging in the plugin, but in pyln not the plugin code itself (edit: no, it's actually hanging inside the plugin, see below). I'll try upgrading pyln. Edit: no success, unfortunately.

laanwj commented 3 years ago

@cdecker Could it be some race condition / circular dependency? E.g. lightningd invokes the plugin (after a database update), which calls listconfigs on the RPC, which in turn depends on lightningd finishing the call to the plugin?

laanwj commented 3 years ago

These are the last messages in the debug log before the hang (possibly, the failing channel triggers the database update that triggers the hang):

2021-01-28T22:30:06.431Z DEBUG   wallet: Restored 0 outgoing HTLCS
2021-01-28T22:30:06.439Z DEBUG   plugin-autoclean: autocleaning not active
2021-01-28T22:30:06.440Z DEBUG   hsmd: Client: Received message 10 from client
2021-01-28T22:30:06.440Z DEBUG   hsmd: Client: Received message 10 from client
2021-01-28T22:30:06.440Z DEBUG   connectd: REPLY WIRE_CONNECTD_ACTIVATE_REPLY with 0 fds
2021-01-28T22:30:06.441Z INFO    lightningd: Restarting onchaind for channel 57199
2021-01-28T22:30:06.493Z UNUSUAL XXX-chan#57199: Peer permanent failure in ONCHAIN: Funding transaction spent
2021-01-28T22:30:06.496Z DEBUG   lightningd: Broadcasting txid XXX
2021-01-28T22:30:06.496Z DEBUG   lightningd: sendrawtransaction: XXX
2021-01-28T22:30:06.496Z INFO    XXX-chan#57199: State changed from ONCHAIN to FUNDING_SPEND_SEEN
2021-01-28T22:30:06.509Z DEBUG   XXX-onchaind-chan#57199: pid 73849, msgfd 40
laanwj commented 3 years ago

Commenting out the call to listconfigs works around the issue :slightly_smiling_face: :

diff --git a/backup/backup.py b/backup/backup.py
index 0e684d6..56b1a8f 100755
--- a/backup/backup.py
+++ b/backup/backup.py
@@ -443,9 +443,9 @@ def on_init(options, **kwargs):
             level="warn"
         )

-    configs = plugin.rpc.listconfigs()
-    if not configs['wallet'].startswith('sqlite3'):
-        kill("The backup plugin only works with the sqlite3 database.")
+    #configs = plugin.rpc.listconfigs()
+    #if not configs['wallet'].startswith('sqlite3'):
+    #    kill("The backup plugin only works with the sqlite3 database.")
laanwj commented 3 years ago

I don't think this is a core lightningd issue anymore, created an issue on the plugins repository: lightningd/plugins#209

cdecker commented 3 years ago

That is indeed correct, the db_write hook is synchronous, so no RPC commands are processed while in the hook handler.

laanwj commented 3 years ago

For anyone coming here: this deadlock issue was worked around by lightningd/plugins#212.