Closed grubles closed 1 year ago
I get this all the time too. I have to try several times to get CLN to start up successfully. I suspect there is some flawed logic where lightningd
sends off an init
message to all the plugins, then does a bunch of startup work itself that can take a long time, then services the plugin init timeout timer before getting around to reading the replies from the plugins that have been sitting in the pipe buffers for a cyber-eternity at this point. Really it shouldn't be timing out any plugins if there are any messages pending to be read from them.
Let me try with a master version, right now!
I get this all the time too. I have to try several times to get CLN to start up successfully. I suspect there is some flawed logic where
lightningd
sends off aninit
message to all the plugins, then does a bunch of startup work itself that can take a long time, then services the plugin init timeout timer before getting around to reading the replies from the plugins that have been sitting in the pipe buffers for a cyber-eternity at this point. Really it shouldn't be timing out any plugins if there are any messages pending to be read from them.
This theory is more believable than autoclean taking more then a fraction of a second to start. Unless some other plugin is monopolizing lightningd on large nodes. My node may not be large enough to see it though.
OK, this changes us to wait for plugins to init, to test @whitslack's theory:
diff --git a/lightningd/plugin.c b/lightningd/plugin.c
index e0dff8188..f3c3affa0 100644
--- a/lightningd/plugin.c
+++ b/lightningd/plugin.c
@@ -180,6 +180,9 @@ static void check_plugins_initted(struct plugins *plugins)
for (size_t i = 0; i < tal_count(plugin_cmds); i++)
plugin_cmd_all_complete(plugins, plugin_cmds[i]);
tal_free(plugin_cmds);
+
+ if (plugins->startup)
+ io_break(plugins);
}
struct command_result *plugin_register_all_complete(struct lightningd *ld,
@@ -1893,6 +1896,11 @@ void plugins_config(struct plugins *plugins)
plugin_config(p);
}
+ /* Wait for them to configure, before continuing: large
+ * nodes can take a while to startup! */
+ if (plugins->startup)
+ io_loop_with_timers(plugins->ld);
+
plugins->startup = false;
}
I had another monkey diff just because I'm curious what is the timing of complete the init
diff --git a/plugins/autoclean.c b/plugins/autoclean.c
index 81e50352a..cd8d34f34 100644
--- a/plugins/autoclean.c
+++ b/plugins/autoclean.c
@@ -563,13 +563,16 @@ static const char *init(struct plugin *p,
cycle_seconds = deprecated_cycle_seconds;
}
+ plugin_log(p, LOG_INFORM, "set up timer for clean up");
cleantimer = plugin_timer(p, time_from_sec(cycle_seconds), do_clean_timer, NULL);
+ plugin_log(p, LOG_INFORM, "restore information from datastore!");
for (enum subsystem i = 0; i < NUM_SUBSYSTEM; i++) {
rpc_scan_datastore_str(plugin, datastore_path(tmpctx, i, "num"),
JSON_SCAN(json_to_u64, &total_cleaned[i]));
}
+ plugin_log(p, LOG_INFORM, "well done, cleanup scheduled. Bye!");
/* Optimization FTW! */
rpc_enable_batching(p);
return NULL;
this is my log output
➜ lightning git:(master) ✗ ./lightningd/lightningd --disable-plugin bcli --plugin=/home/vincent/Github/clightning4j/btcli4j/btcli4j-gen.sh --database-upgrade=true
2022-11-25T11:34:12.483Z INFO plugin-manager: /home/vincent/Github/lightning/plugins/bcli: disabled via disable-plugin
2022-11-25T11:34:18.044Z INFO lightningd: --------------------------------------------------
2022-11-25T11:34:18.044Z INFO lightningd: Server started with public key 031ba17032cedd2fdde9610e21b2b776b9e209d570a0dc0066c247696ab12aa225, alias LATENTWATCH-c2-4-gece7784-modded (color #031ba1) and lightningd v22.11rc2-4-gece7784-modded
2022-11-25T11:34:18.056Z INFO plugin-autoclean: set up timer for clean up
2022-11-25T11:34:18.057Z INFO plugin-autoclean: restore information from datastore!
2022-11-25T11:34:18.057Z INFO plugin-autoclean: well done, cleanup scheduled. Bye!
It may be the case CLBOSS is monopolizing lightningd. My setup is BTRFS on HDDs so it's inherently slow to begin with, so maybe it's a mix of a slow machine + CLBOSS.
My setup is PostgreSQL on F2FS on a pair of SSDs in RAID1, definitely not a slow setup. For what it's worth, I don't think I've seen the init timeout since I solved my issue that was causing very slow startups. Still, the point remains that lightningd
shouldn't be blaming plugins for not replying in time when lightningd
itself is taking longer than the timeout to notice their replies.
Yes, I've been thinking about this: we need a new kind of timeout, one that counts idle time only....
CLN at 02f9c2df249c342a282e47179e643471cdf9cd77
My HDDs are too slow?