Closed bschwand closed 5 months ago
Hmm. I’ve tried to start a new node today and it didn’t work either, seems like service start
does nothing, while executing rc scripts manually works.
The only difference it was 13.3 jail, while the system is 13.1.
I’ll debug and get back to you. Thank you for letting me know!
ok, great that you can reproduce the issue. BTW I am running on 14.0, host and jail
It seems the daemon
utility is broken since 13.3
onwards. No issues on 13.2
and before.
If you pass any argument to daemon
that requires it to spawn a child process -- e.g. -r
or -P
, it silently fails to start the process, while still returning 0.
Demo:
This works on 13.2
and 13.3
: daemon /bin/sh -c 'while true; do echo Test; sleep 10; done'
: daemon runs in the foreground.
This works on 13.2
but is broken on 13.3
: daemon -r /bin/sh -c 'while true; do echo Test; sleep 10; done'
: daemon shall spawn the child and remain running to monitor it.
This is also broken on 13.3
: daemon -P /var/run/test.pid /bin/sh -c 'while true; do echo Test; sleep 10; done'
: daemon shall spawn a child here too, and write its own pid to the file.
I'm not sure how to fix it -- it looks very much like a bug, but if it is some new behavior I will need to figure out how to launch a daemon properly in this new reality. sigh.
As a workaround, can you try doing it in 13.2
jail? If you mount the same folders the script shall just setup daemons, it's not going to create new identity. Or you can just downgrade the existing jail iocage upgrade <jailname> -r 13.2-RELEASE
-- that's what I did -- and bam! it suddenly works.
I've tried this in FreeBSD 14.0 and 13.3 VMs and the test above works. It seems to only not work when jail and host OS is mismatched. Can you confirm that both your host and jail are the same versions?
I'm going to try to test the whole node install on 14.0 VM. it might take some time generating identity etc.
I can confirm I use 14.0 in both host and jail jail: FreeBSD storj 14.0-RELEASE-p5 FreeBSD 14.0-RELEASE-p5 #0: Tue Feb 13 23:37:36 UTC 2024 host FreeBSD proliant21.bschwand.net 14.0-RELEASE-p5 FreeBSD 14.0-RELEASE-p5 #0: Tue Feb 13 23:37:36 UTC 2024
Thank you. I'll install these versions specifically in the VM tonight and test.
As a workaround, can you try doing it in
13.2
jail? If you mount the same folders the script shall just setup daemons, it's not going to create new identity. Or you can just downgrade the existing jailiocage upgrade <jailname> -r 13.2-RELEASE
-- that's what I did -- and bam! it suddenly works.
ok I'll look into that, I am using appjail, not iocage so I need to figure how I can do this and not mess up my existing setup :-) but still, that is not a real bug fix ;-)
It seems to be more of an issue with the jails and FreeBSD somehow, not the scripts ?
Seems to be working on 14.0-RELEASE-p6
root@:~ # git clone https://github.com/arrogantrabbit/freebsd_storj_installer.git
...
root@:~ # cp -rv freebsd_storj_installer/overlay/ /
...
root@:~ # service storagenode_updater onestart
Starting storagenode_updater.
root@:~ # service storagenode_updater status
storagenode_updater is running as pid 1188.
root@:~ # freebsd-version
14.0-RELEASE-p6
Maybe you have installed freebsd update and did not reboot, so your kernel and user space is out of sync?
I rebooted many times let me do an update then maybe from p5 to p6 something changed
Can you try this in the command line?
daemon -P /var/run/test.pid /bin/sh -c 'echo Stared; sleep 5; echo Finished'
Does this work?
root@storj:~/freebsd_storj_installer # daemon -P /var/run/test.pid /bin/sh -c 'echo Stared; sleep 5; echo Finished'
root@storj:~/freebsd_storj_installer # Stared
Finished
root@storj:~/freebsd_storj_installer #
Great. So this is not a daemon issue that I was seeing. Let's go back to square one.
Do you have anything in the daemon logs? /var/log/storagenode.log and /var/log/storagenode_updater.log?
yes, logs are growing, the daemons are running and filling the storage location. Like I said, it's running since the end of the install.sh script, but I can not stop or get status using the service command
ok so I updated host and rebooted and I am at 14.0 p6
root@storj:~ # service storagenode_updater onestart Starting storagenode_updater. root@storj:~ # service storagenode_updater status storagenode_updater is not running. root@storj:~ # ls /var/run/ /var/run/bhyve/ /var/run/ld-elf.so.hints /var/run/ppp/ /var/run/utx.active /var/run/dhclient/ /var/run/ld-elf32.so.hints /var/run/storagenode_updater.pid /var/run/wpa_supplicant/ root@storj:~ # ls /var/run/storagenode_updater.pid /var/run/storagenode_updater.pid root@storj:~ # cat /var/run/storagenode_updater.pid 29639root@storj:~ #
I get the same results wether I run on the host or within the jail at the beginning of this storagenode_updater is running
[bruno@proliant21 ~]$ sudo service -j storj storagenode_updater status
storagenode_updater is not running.
[bruno@proliant21 ~]$ sudo service -j storj storagenode_updater stop
storagenode_updater not running? (check /var/run/storagenode_updater.pid).
[bruno@proliant21 ~]$ sudo service -j storj storagenode_updater start
Starting storagenode_updater.
daemon: process already running, pid: 29639
/usr/local/etc/rc.d/storagenode_updater: WARNING: failed to start storagenode_updater
[bruno@proliant21 ~]$ sudo kill -9 29639
[bruno@proliant21 ~]$ sudo service -j storj storagenode_updater start
Starting storagenode_updater.
[bruno@proliant21 ~]$ sudo service -j storj storagenode_updater status
storagenode_updater is not running.
[bruno@proliant21 ~]$ sudo service -j storj storagenode_updater stop
storagenode_updater not running? (check /var/run/storagenode_updater.pid).
somehow stop and status fails to recognize it running
Can you try from inside the jail? The "service -j .. " never worked for me
You can try enabling debugging with sysrc rc_debug=YES
, and then try stopping the service.
Can you stop other services the same way? Just the node does not get found and/or stoped?
inside or outside the jail it gives me the same results. always thinks nothing is running, can't status or stop it
argh.... solved. was all a jail config issue. Appjail does not enable this by default
exec.start: "/bin/sh /etc/rc"
exec.stop: "/bin/sh /etc/rc.shutdown jail"
mount.devfs
so ps did not work, and services did not start at boot either. with that it's all working too, status, stop, run at jail start.
seems like there is some other issue left as you mentioned there https://github.com/arrogantrabbit/freebsd_storj_installer/issues/3#issuecomment-2057481878 but I think it's not to do with your scripts actually.
Thanks a lot for the help and back and forth !
Awesome!
Can you actually get the storagenode_updater
to stop? It seems I can't -- the shell that is running my replacement updater script is ignoring SIGTERM (by design, actually) so I would need to add _stop() function in the rc script and send SIGINT instead... But I guess it's irrelevant, as you can stop the whole jail instead, and everyone dies :)
you are correct, updater gets hung
[bruno@proliant21 ~]$ sudo service -j storj storagenode_updater stop
Stopping storagenode_updater.
Waiting for PIDS: 74546
well it would be cleaner, also if the jail runs more stuff than just that service why use bash in that script, would sh behave the same way ? but... I see storagenode-updater uses /bin/sh ??
All shells behave that way, it's by design.
The updater uses sh because is a hack I put together because official stor provided one did not / (still does not?) restart the service correctly after updating the binary.
I have the fix, will check it in later today
yeah I just researched that a bit, I did not know, interesting.
commits 7b8748a and 984c423 contain fix for stopping the updater on FreeBSD 13.3+ and 14+.
updated and tested all combinations of start stop status restart all good !
Hi,
I just configured a jail running storj with your scripts, they are great. Storj doc should really mention it, and at least mention they already build native FreeBSD binaries... So the issue I have is that at the end of the script, the daemons are started and seem to run fine, I can see the processes on the host. However,
in the jail, I get the same result Any idea what is going on ?