Closed whitslack closed 3 years ago
An idea would be for lightningd
to hardlink (or if that fails, copy) the subdaemons and builtin plugins to a subdaemons/
directory in the $LIGHTNINGDIR
. That way we have a sample of the plugins at startup. When starting, we erase the entire directory at once, then construct it before going into the rest of the code.
We could also have --version
of each subdaemon and builtin plugin include a sha256sum of the concatenation of (or maybe a merkle directory tree of) their source code (git log -1
is not enough in a development env since we could have a modified version locally running) just to ensure that the expected versions are the same, in case of a race condition where an update is in a halfway state when the daemon is restarted.
hardlink (or if that fails, copy) the subdaemons and builtin plugins to a
subdaemons/
directory in the$LIGHTNINGDIR
.
That would fail if $LIGHTNINGDIR
is mounted noexec
. Atypical but not unreasonable.
Well, plugins/
could be located in the $LIGHTNINGDIR
, so we have some prior justification to requiring non-noexec
mounts.
I feel this is a reasonably clean solution that does not require us to move to zygote/stemcell type.
On the other hand, with the vague whispers of Android restricting process-forking, it may be needed to put everything in a single process anyway, which would mean something very much like a zygote would be needed as well.
Yeah, I learned this the hard way, and I now shutdown before make install
.
We could have subdaemons send their version string on startup, then restart lightningd (after a few seconds' sleep) if the version is wrong. If we restart and our own version hasn't changed, then simply fail. I think upgrading running instances is a better choice than trying to keep them going?
Self-restarting is an interesting thought. One nice aspect is that the new process would retain the same PID as the old process, so system service managers (like OpenRC) wouldn't notice anything awry at any instant.
I think this is important, is there any update? Should I avoid the PPA and just install manually (so that I can be sure ubuntu does not decide to upgrade automatically) @rustyrussell
@MrManPew: I'm not that well-versed in Debian-like distros, but I would think you can pin/freeze an installed package so that it will not be upgraded unless unpinned/thawed.
We could also just add a protocol version (sha256 hash of the wire file?) to the init messages, so subdaemons can decide whether it matches their version, and throw a tantrum if it doesn't. That'd not be as seamless as the zygote or file descriptor option, but it'd be safe and noisy pointing the user to towards how to fix the issue (restart).
Users would still need to freeze their installed version of C-Lightning so that the OS does not upgrade it automatically in the background. I can tell you, one of the greatest annoyances as a node operator is coming back to my machine only to notice that my node shit the bed half a day ago. Yes, I could use some sort of service babysitter that automatically restarts crashed services, but I'm reluctant to do that, as it would paper over problems.
While you all are considering a fundamental redesign of C-Lightning's daemon organization, I would request that you please consider what happens when a C-Lightning installation is upgraded on disk while an instance of that installation is running.
It is customary in the Linux world for the system package manager to upgrade installed packages simply by replacing executable files on disk with newer versions, irrespective of whether those executables may currently be running. This is fine under POSIX semantics, as the mapped VMAs of any running processes will maintain references to the inodes of the replaced versions, preventing them from being deallocated on disk, even after their link counts drop to zero, until no more VMAs reference them. Thus, running daemons will continue running the old code until they are subsequently restarted. While this works just fine for most daemons, it fails spectacularly for C-Lightning, which spawns subdaemons at arbitrary times by
execve
ing their executable images on disk without regard for whether those images are of the same (or compatible) version as their parent daemons.One potential solution would be to use a "zygote" process in the manner of Chromium/Chrome. A zygote process starts once upon startup of the top-level process and has the sole purpose of forking child processes, which in turn subsequently differentiate into whatever subdaemon specialization is needed, notably without
execve
ing anything on disk. However, this pattern may not be well-suited to C-Lightning, given its capability for dynamically loading and unloading plugins.Another possibility would be for the top-level
lightningd
process toopen
every critical subdaemon executable file at startup, and then when spawning a subdaemon, the appropriate/proc/self/fd/<n>
symlink should beexecve
d rather than/usr/libexec/c-lightning/<whatever>
. This would have the effect of executing whatever version of the subdaemon was installed at the time whenlightningd
was launched, irrespective of whatever version of the subdaemon might be installed at the time of spawning. The downsides are: this only works on Linux, and the process name of the subdaemon will then be some opaque number, though I think there may be some tricks to rewrite that for the benefit ofps
,top
, and friends. Spawning non-critical plugins could still be done byexecve
ing whatever happens to be on disk at the time, with the understanding that the version of the plugin on disk may speak a different protocol than the parent process.One last potential solution (really more of a punt) would be for subdaemons simply to handshake on a protocol version upon spawning. This wouldn't allow continued operation in the event of a version mismatch, but it would at least protect against accidental catastrophe due to internal protocol changes.
Impetus for this request? I have personally experienced my C-Lightning node dying upon attempting to spawn a new subdaemon after I had upgraded the installed package on disk. Note that this does not happen immediately upon upgrading the package on disk but will happen at some unpredictable time afterward. To mitigate this problem, I have hacked up the Gentoo ebuild for C-Lightning so that it cowardly refuses to upgrade an installation of C-Lightning if it notices that an instance may be running, but this is gross and runs contrary to the expectations of users, who expect that they can upgrade packages on disk and then restart their services afterward to switch to the new versions.