Closed laanwj closed 3 years ago
Not sure this is useful, but I found out with gdb where it hangs:
(gdb) bt
#0 0x00000008007fc12a in _poll () from /lib/libc.so.7
#1 0x0000000800aa7246 in ?? () from /lib/libthr.so.3
#2 0x000000000030391e in io_loop (timers=0x0, expired=0x0) at ccan/ccan/io/poll.c:412
#3 0x00000000002aa612 in plugins_exclusive_loop (plugins=0x804bcf2b8) at lightningd/plugin.c:1857
#4 0x00000000002ad358 in plugin_hook_db_sync (db=0x800ae4e28) at lightningd/plugin_hook.c:397
#5 0x00000000002b2353 in db_report_changes (db=0x800ae4e28, final=<optimized out>, min=0) at wallet/db.c:857
#6 0x00000000002b2256 in db_commit_transaction (db=0x800ae4e28) at wallet/db.c:927
#7 0x0000000000292456 in main (argc=16, argv=<optimized out>) at lightningd/lightningd.c:972
After ctrl-C
2021-01-28T21:53:57.138Z UNUSUAL plugin-backup.py: Traceback (most recent call last):
2021-01-28T21:53:57.139Z UNUSUAL plugin-backup.py: File \"/home/user/src/plugins/backup/backup.py\", line 485, in <module>
2021-01-28T21:53:57.139Z UNUSUAL plugin-backup.py: plugin.run()
2021-01-28T21:53:57.139Z UNUSUAL plugin-backup.py: File \"/home/user/.local/lib/python3.7/site-packages/pyln/client/plugin.py\", line 687, in run
2021-01-28T21:53:57.139Z UNUSUAL plugin-backup.py: partial = self._multi_dispatch(msgs)
2021-01-28T21:53:57.139Z UNUSUAL plugin-backup.py: File \"/home/user/.local/lib/python3.7/site-packages/pyln/client/plugin.py\", line 672, in _multi_dispatch
2021-01-28T21:53:57.139Z UNUSUAL plugin-backup.py: self._dispatch_request(request)
2021-01-28T21:53:57.139Z UNUSUAL plugin-backup.py: File \"/home/user/.local/lib/python3.7/site-packages/pyln/client/plugin.py\", line 564, in _dispatch_request
2021-01-28T21:53:57.139Z UNUSUAL plugin-backup.py: result = self._exec_func(method.func, request)
2021-01-28T21:53:57.139Z UNUSUAL plugin-backup.py: File \"/home/user/.local/lib/python3.7/site-packages/pyln/client/plugin.py\", line 549, in _exec_func
2021-01-28T21:53:57.139Z UNUSUAL plugin-backup.py: return func(*ba.args, **ba.kwargs)
2021-01-28T21:53:57.140Z UNUSUAL plugin-backup.py: File \"/home/user/.local/lib/python3.7/site-packages/pyln/client/plugin.py\", line 796, in _init
2021-01-28T21:53:57.140Z UNUSUAL plugin-backup.py: return self._exec_func(self.child_init, request)
2021-01-28T21:53:57.140Z UNUSUAL plugin-backup.py: File \"/home/user/.local/lib/python3.7/site-packages/pyln/client/plugin.py\", line 549, in _exec_func
2021-01-28T21:53:57.140Z UNUSUAL plugin-backup.py: return func(*ba.args, **ba.kwargs)
2021-01-28T21:53:57.140Z UNUSUAL plugin-backup.py: File \"/home/user/src/plugins/backup/backup.py\", line 446, in on_init
2021-01-28T21:53:57.140Z UNUSUAL plugin-backup.py: configs = plugin.rpc.listconfigs()
2021-01-28T21:53:57.140Z UNUSUAL plugin-backup.py: File \"/home/user/.local/lib/python3.7/site-packages/pyln/client/lightning.py\", line 837, in listconfigs
2021-01-28T21:53:57.140Z UNUSUAL plugin-backup.py: return self.call(\"listconfigs\", payload)
2021-01-28T21:53:57.141Z UNUSUAL plugin-backup.py: File \"/home/user/.local/lib/python3.7/site-packages/pyln/client/lightning.py\", line 329, in call
2021-01-28T21:53:57.141Z UNUSUAL plugin-backup.py: resp, buf = self._readobj(sock, buf)
2021-01-28T21:53:57.141Z UNUSUAL plugin-backup.py: File \"/home/user/.local/lib/python3.7/site-packages/pyln/client/lightning.py\", line 282, in _readobj
2021-01-28T21:53:57.141Z UNUSUAL plugin-backup.py: b = sock.recv(max(1024, len(buff)))
2021-01-28T21:53:57.142Z UNUSUAL plugin-backup.py: File \"/home/user/.local/lib/python3.7/site-packages/pyln/client/lightning.py\", line 256, in recv
2021-01-28T21:53:57.142Z UNUSUAL plugin-backup.py: return self.sock.recv(length)
2021-01-28T21:53:57.142Z UNUSUAL plugin-backup.py: KeyboardInterrupt
2021-01-28T21:53:57.142Z INFO plugin-backup.py: Killing plugin: Plugin exited before completing handshake.
2021-01-28T21:53:57.142Z **BROKEN** plugin-backup.py: Plugin marked as important, shutting down lightningd!
So it is hanging in the plugin, but in pyln not the plugin code itself (edit: no, it's actually hanging inside the plugin, see below). I'll try upgrading pyln.
Edit: no success, unfortunately.
@cdecker Could it be some race condition / circular dependency? E.g. lightningd
invokes the plugin (after a database update), which calls listconfigs
on the RPC, which in turn depends on lightningd
finishing the call to the plugin?
These are the last messages in the debug log before the hang (possibly, the failing channel triggers the database update that triggers the hang):
2021-01-28T22:30:06.431Z DEBUG wallet: Restored 0 outgoing HTLCS
2021-01-28T22:30:06.439Z DEBUG plugin-autoclean: autocleaning not active
2021-01-28T22:30:06.440Z DEBUG hsmd: Client: Received message 10 from client
2021-01-28T22:30:06.440Z DEBUG hsmd: Client: Received message 10 from client
2021-01-28T22:30:06.440Z DEBUG connectd: REPLY WIRE_CONNECTD_ACTIVATE_REPLY with 0 fds
2021-01-28T22:30:06.441Z INFO lightningd: Restarting onchaind for channel 57199
2021-01-28T22:30:06.493Z UNUSUAL XXX-chan#57199: Peer permanent failure in ONCHAIN: Funding transaction spent
2021-01-28T22:30:06.496Z DEBUG lightningd: Broadcasting txid XXX
2021-01-28T22:30:06.496Z DEBUG lightningd: sendrawtransaction: XXX
2021-01-28T22:30:06.496Z INFO XXX-chan#57199: State changed from ONCHAIN to FUNDING_SPEND_SEEN
2021-01-28T22:30:06.509Z DEBUG XXX-onchaind-chan#57199: pid 73849, msgfd 40
Commenting out the call to listconfigs
works around the issue :slightly_smiling_face: :
diff --git a/backup/backup.py b/backup/backup.py
index 0e684d6..56b1a8f 100755
--- a/backup/backup.py
+++ b/backup/backup.py
@@ -443,9 +443,9 @@ def on_init(options, **kwargs):
level="warn"
)
- configs = plugin.rpc.listconfigs()
- if not configs['wallet'].startswith('sqlite3'):
- kill("The backup plugin only works with the sqlite3 database.")
+ #configs = plugin.rpc.listconfigs()
+ #if not configs['wallet'].startswith('sqlite3'):
+ # kill("The backup plugin only works with the sqlite3 database.")
I don't think this is a core lightningd issue anymore, created an issue on the plugins repository: lightningd/plugins#209
That is indeed correct, the db_write hook is synchronous, so no RPC commands are processed while in the hook handler.
For anyone coming here: this deadlock issue was worked around by lightningd/plugins#212.
Somewhere between commit 990623676938adeff66488b43fbd03c1888624ad and 015ac37d924e31bb87b8597da9f863e82270657b , the backup plugin started causing a hang in lightningd at startup.
I tested this with the vanilla plugin, without my socket pipe changes.
I load it with
Lightningd gets as far as connecting to a few peers but hangs. before it starts processing RPC commands (e.g.
lightning-cli listpeers
just hangs). Without the plugin it continues without problems.I tried to figure out where it hangs by adding logging statements to the plugin,
but it doesn't seem to hang in the plugin itself, at least not during thedb_write
hook nor during plugin initialization.I will try to bisect this. Unfortunately, a database schema upgrade happened in between (4dfbee47f7c9bdf3ef1aa5fe84255514e924af2b) so I cannot go back to before that. It looks like it is already broken at that commit.
System:
FreeBSD 12.2
Compiler:FreeBSD clang version 8.0.1
Edit: it's possibly unrelated to any specific version update here, but a chicken/egg issue with an early database update before RPC responds.