Open arogozhnikov opened 1 year ago
The root filesystem (which includes /usr
, /lib
, /bin
, and /etc
which is where default systemd units are stored) in the ot-2 is mounted read-only to prevent modification. This is because that root filesystem is completely overwritten during update - so if you put a systemd service in /etc
, whenever you update your robot it will be gone.
/home
, /var
, and /data
are on a separate partition that does not get overwritten during update. That means that the user unit search path /home/.config/systemd/user/
is a good place to put custom systemd units - try there?
@sfoster1
Hi Seth,
That means that the user unit search path /home/.config/systemd/user/ is a good place to put custom systemd units - try there?
That's what I get:
mkdir: can't create directory '/home/.config/': Read-only file system
so I assume that's not going to work.
Now, I've tried setting up the way you described in #106
- A systemd service unit
- A directory /var/home/.config/systemd/user/opentrons.target.wants that includes a symlink to your service
and couldn't make systemctl see the service:
~ # cat /root/tunnel.service
[Unit]
Description=Static tunnel to ssh into machine
After=basic.target
[Service]
Type=exec
ExecStart=/root/cloudflared tunnel --config /root/tunnel_conf.yaml --protocol=quic run
[Install]
WantedBy=opentrons.target
~ # ls /var/home/.config/systemd/user/opentrons.target.wants -lah
total 2
drwxr-xr-x 2 root root 1.0K Dec 1 23:19 .
drwxr-xr-x 3 root root 1.0K Dec 1 23:18 ..
lrwxrwxrwx 1 root root 20 Dec 1 23:19 tunnel.service -> /root/tunnel.service
systemctl daemon-reload
# this command returns nothing, and I see nothing relevant when listing either
systemctl list-units --type=service --all | grep tunnel
remark: open to using any option (i.e. not necessary systemd) that can launch process on boot as daemons
remark: open to using any option (i.e. not necessary systemd) that can launch process on boot as daemons
Oh! In that case, we support boot scripts run with run-parts
. Drop an executable shell script named NN-some-ascii-text
where NN
is a number (this is mostly a convention - the only rules are that it has to be ascii letters, numbers, or -
and _
) and it'll get run at boot:
# cat /var/data/boot.d/00-demo
echo "my service ran"
touch /var/data/my-service-ran
# ls -l /var/data/boot.d/00-demo
-rwxr-xr-x 1 root root 53 Dec 2 14:13 /var/data/boot.d/00-demo
# reboot
# # ls -l /var/data/
total 310
(other results removed for clarity)
-rw-r--r-- 1 root root 0 Dec 2 14:14 my-service-ran
# journalctl -u opentrons-run-boot-scripts --no-pager
-- Logs begin at Fri 2018-06-22 11:11:49 UTC, end at Fri 2022-12-02 14:16:05 UTC. --
-- Reboot --
Dec 02 14:14:21 opentrons run-parts[162]: my service ran
Dec 02 14:14:21 opentrons systemd[1]: Starting Opentrons: Run user-supplied boot scripts...
Dec 02 14:14:21 opentrons systemd[1]: Started Opentrons: Run user-supplied boot scripts.
One thing to keep in mind is that run-parts scripts are unfortunately a lot less configurable than systemd services. You might know how to do this stuff better than me, but those scripts all want to execute like a systemd oneshot
service - the script runs once and then exits. That means you really need cloudflared
to daemonize (fork and abandon its parent) when called on the commandline - there might be a -d
,--daemonize
command line flag, or maybe just the absence of a --foreground
flag or something, I'm not familiar with cloudflared and can't find a good reference for its command line params.
@sfoster1 nice, does run-parts just assumes these files are shell scripts?
asking because there is no shebang in your example
Ah, yes it does. It runs them through the shell.
@sfoster1 likely I'm doing something wrong, but the service isn't started during reboot:
Location:
~ # ls /var/data/boot.d/00-cftunnel -lah
-rw-r--r-- 1 root root 375 Dec 2 17:27 /var/data/boot.d/00-cftunnel
In log, nothing shows it was found or called:
Dec 15 18:31:03 opentrons ot-commit-machine-id[164]: machine-id "05a2d52f19ca460a9f87f944c6532461" already committed. Exiting without doing anything.
Dec 15 18:31:02 opentrons systemd[1]: Starting Jupyter notebook server...
Dec 15 18:31:02 opentrons systemd[1]: Starting Opentrons: Run user-supplied boot scripts...
Dec 15 18:31:02 opentrons systemd[1]: Starting Network Connectivity...
Dec 15 18:31:02 opentrons systemd[1]: Starting Opentrons: Ensure system wired connections...
Dec 15 18:31:02 opentrons systemd[1]: Starting Rerun udev for block devices...
Dec 15 18:31:02 opentrons systemd[1]: Started D-Bus System Message Bus.
Contents of file:
# cat /var/data/boot.d/00-cftunnel
echo "starting cloudflared tunnel"
echo -n $(date -u) >> /data/tunnel.log
echo "starting cloudflared tunnel" >> /root/tunnel.log
tmux kill-session -t ot-tunnel-session || (echo 'no tmux session to stop' >> /root/tunnel.log)
<actual cloudflared command goes here>
Update: Command that you suggested:
-- Reboot --
Dec 15 18:31:02 opentrons systemd[1]: Starting Opentrons: Run user-supplied boot scripts...
Dec 15 18:31:02 opentrons systemd[1]: Started Opentrons: Run user-supplied boot scripts.
@arogozhnikov Mark it executable: chmod u+x /var/data/boot.d/00-cftunnel
@sfoster1 I think I've tried everything and cloudflare just can't run at this point in boot process. I am not 100% sure, but here is what I have:
/var/data/boot.d/00-cftunnel
runs at startup/var/data/boot.d/00-cftunnel
, tunnel is started normally.I do not see any logs or errors from cloudflared. Adding sleep 60
before running cloudflared did not help either
Any other ideas?
Well huh. A lot of my ideas are broken by cloudflared
working fine if you source /var/data/boot.d/00-cftunnel
. I assume you're doing something like their setup docs with a config file somewhere on the OT-2 filesystem that you're passing the path to in /var/data/boot.d/00-cftunnel
, right?
Where is that config file on the OT-2 filesystem? I wonder if there's some problem like that part of the filesystem not being mounted at the time you run 00-cftunnel
. And putting sleep 60
in there wouldn't necessarily fix it because runparts
and the systemd unit it's in would just see that as the script taking a long time and delay starting whatever depends on it.
Where on the OT-2 filesystem did you put the cftunnel binary+supporting solibs and config file?
I place everything (binary, config, logs) right under /root
/root/cloudflared tunnel --config /root/tunnel_conf.yaml --protocol=quic --logfile /root/tunnel.log run > /root/tunnel_last_start.log 2>&1
And then there's nothing in /root/tunnel.log
or /root/tunnel_last_start.log
when you ssh in after boot, right?
I'm really not sure what in the world is going wrong but one thing we could try is your idea to wait some time before starting the service, but do it in a fork
'd child of the run-parts script. What you'd want to do is the following:
/var/data/boot.d/cftunnel-worker
, with chmod +x
and a bash shebang, and put basically everything that's currently in 00-cftunnel
in there, including an initial 60-second sleep00-cftunnel
only do the following:
nohup /var/data/boot.d/cftunnel-worker 0<&- &>/dev/null &
\
This should do something similar to daemonize(1) which is not available on the ot2. Ignore this next part if you already know what it means, but it basically creates a child process and then severs the child process's connection to the parent process so the child can run forever in the background. So if the problem we're facing is (1) system resources aren't ready enough at the time runparts
runs so cftunnel can't start and (2) doing a sleep 60
in the runparts
script just means that systemd delays bringing up those parts of the system until the script is done, this should solve it by avoiding (2).
It is not /root
not mounted, but something with network, I assume. Also there is probably something around tmux + cf used together
your solution (nohup + delay) seems to work. Need more tests to be sure about that, but at least it restarted successfully twice
Delay is critical, otherwise I get this in logs:
{"level":"warn","error":"Group ID 0 is not between ping group 1 to 0","time":"2023-06-14T21:12:47Z","message":"The user running cloudflared process has a GID (group ID) that is not within ping_group_range. You might need to add that user to a group within that range, or instead update the range to encompass a group the user is already in by modifying /proc/sys/net/ipv4/ping_group_range. Otherwise cloudflared will not be able to ping this network"}
{"level":"warn","error":"cannot create ICMPv4 proxy: Group ID 0 is not between ping group 1 to 0 nor ICMPv6 proxy: socket: permission denied","time":"2023-06-14T21:12:47Z","message":"ICMP proxy feature is disabled"}
{"level":"error","event":0,"error":"lookup _v2-origintunneld._tcp.argotunnel.com on [2001:4860:4860::8888]:53: dial udp [2001:4860:4860::8888]:53: connect: cannot assign requested address","time":"2023-06-14T21:12:47Z","message":"edge discovery: error looking up Cloudflare edge IPs: the DNS query failed"}
{"level":"error","event":0,"time":"2023-06-14T21:12:47Z","message":"Please try the following things to diagnose this issue:"}
{"level":"error","event":0,"time":"2023-06-14T21:12:47Z","message":" 1. ensure that argotunnel.com is returning \"origintunneld\" service records."}
{"level":"error","event":0,"time":"2023-06-14T21:12:47Z","message":" Run your system's equivalent of: dig srv _origintunneld._tcp.argotunnel.com"}
{"level":"error","event":0,"time":"2023-06-14T21:12:47Z","message":" 2. ensure that your DNS resolver is not returning compressed SRV records."}
{"level":"error","event":0,"time":"2023-06-14T21:12:47Z","message":" See GitHub issue https://github.com/golang/go/issues/27546"}
{"level":"error","event":0,"time":"2023-06-14T21:12:47Z","message":" For example, you could use Cloudflare's 1.1.1.1 as your resolver:"}
{"level":"error","event":0,"time":"2023-06-14T21:12:47Z","message":" https://developers.cloudflare.com/1.1.1.1/setting-up-1.1.1.1/"}
{"level":"info","time":"2023-06-14T21:12:47Z","message":"ICMP proxy will use 0.0.0.0 as source for IPv4"}
{"level":"info","time":"2023-06-14T21:12:47Z","message":"ICMP proxy will use :: as source for IPv6"}
Ah, I guess it's not designed to handle "I'm not currently network-connected" or something. Well, I'm glad the nohup plus delay works! Let me know if something fails in those further tests - I'll leave this open for another couple days.
cloudflared
is a communication utility that connects from/to an internal network of the company. It provides zero-trust connection from the internet.It supports pretty much any OS of any distribution.
However, buildroot locks systemd for editing, and when cloudflared tries to install service, I run into this problem:
Are there any tools to override this? I saw the discussion in #106 about allowing customers to use services.