balena-os / meta-balena

A collection of Yocto layers used to build balenaOS images
https://www.balena.io/os
970 stars 116 forks source link

Device at v2.85.2 still online and accessible even if glider disconnected #2352

Open acostach opened 3 years ago

acostach commented 3 years ago

Steps to reproduce: provision a device at v2.85.2 Run the testlodge http-connect or socks5 test.

Note that when glider is closed on the host PC, the device under test stays online and the webterminal is still accessible. When glider is closed, the device should go offline.

Leviathan tests for current open meta-balena PRs are marked as passed.

acostach commented 3 years ago

Tested on an older release, on Pi4 2.73.1+rev1 which was known to be working correctly. Same as with the v2.85.2 release, device comes back up online and the webterminal is also accessible although glider was closed on the host PC.

klutchell commented 3 years ago

Does glider show logs of the connections?

klutchell commented 3 years ago

I tried to reproduce with the steps from TC141 - socks5 proxy and I cleaned them up a bit in the process.

Everything works as expected for me. Can you try the test again @acostach and share any differences in your results?

If the steps below work we can update Testlodge accordingly, but at the core the test is the same.

TC141 - socks5 proxy

  1. Start glider mixed proxy on your workstation via docker. Keep this terminal open and do not kill the container yet. Make sure to allow port 8123 through your workstation firewall!
docker run --rm --network host --user 995 nadoo/glider:v0.15.0 -verbose -listen :8123
  1. Set these 2 variables in a new workstation terminal to make the next steps easier.
SERVER_IP=<your_workstation_ip_running_glider>
DEVICE_UUID=<your_device_uuid>
  1. Paste this command on your workstation to put a redsocks config on the device.
cat << EOTTY | balena ssh "${DEVICE_UUID}" --tty
mkdir -p /mnt/boot/system-proxy/
cat <<'EOF'> /mnt/boot/system-proxy/redsocks.conf
base {
log_debug = off;
log_info = on;
log = stderr;
daemon = off;
redirector = iptables;
}
redsocks {
type = socks5;
ip = ${SERVER_IP};
port = 8123;
local_ip = 127.0.0.1;
local_port = 12345;
}
EOF
reboot
EOTTY
  1. Verify the device connects to the cloud after it reboots.
balena device $DEVICE_UUID
  1. Verify the glider logs in your other terminal show connections from the device.
2021/11/11 12:51:46 group.go:186: [group] only 1 forwarder found, disable health checking
2021/11/11 12:51:46 server.go:107: [socks5] listening UDP on :8123
2021/11/11 12:51:46 mixed.go:68: [mixed] listening TCP on :8123
2021/11/11 13:01:15 server.go:87: [socks5] 192.168.1.209:57918 <-> 34.192.70.219:443 via DIRECT
2021/11/11 13:01:18 server.go:87: [socks5] 192.168.1.209:57926 <-> 35.169.76.143:443 via DIRECT
2021/11/11 13:01:28 server.go:87: [socks5] 192.168.1.209:57932 <-> 54.164.146.95:443 via DIRECT
2021/11/11 13:01:28 server.go:87: [socks5] 192.168.1.209:57936 <-> 54.160.66.231:443 via DIRECT
2021/11/11 13:01:29 server.go:87: [socks5] 192.168.1.209:57940 <-> 52.203.153.157:443 via DIRECT
2021/11/11 13:01:29 server.go:87: [socks5] 192.168.1.209:57944 <-> 34.192.70.219:443 via DIRECT
2021/11/11 13:01:29 server.go:87: [socks5] 192.168.1.209:57948 <-> 54.147.227.172:443 via DIRECT
  1. Kill the glider container with CTRL-C.
  2. Verify the device is no longer connected to the cloud.
balena device $DEVICE_UUID
  1. Restart the glider proxy with the same command from step one.
docker run --rm --network host --user 995 nadoo/glider:v0.15.0 -verbose -listen :8123
  1. Verify the device is connected to the cloud again.
balena device $DEVICE_UUID
  1. Remove the redsocks config from the device.
cat << EOTTY | balena ssh "${DEVICE_UUID}" --tty
rm -rf /mnt/boot/system-proxy/redsocks.conf && reboot
EOTTY
  1. While the device is rebooting you can kill the glider proxy.
  2. Verify the device reconnects to the cloud without the proxy.
balena device $DEVICE_UUID
acostach commented 3 years ago

@klutchell Sure I'll test these steps. By any chance, did you try leave glider in container closed for ~10 minutes and check if the board shows back up even if glider is stopped?

klutchell commented 3 years ago

By any chance, did you try leave glider in container closed for ~10 minutes and check if the board shows back up even if glider is stopped?

No, I only waited long enough to see it disconnect when I closed glider. But I can try that now.

UPDATE: I waited an hour and my device remained offline when glider was not running.

acostach commented 3 years ago

Hi @klutchell , I've tried now with 2.85.2 and can still reproduce the issue:

docker run --rm --network host --user 995 nadoo/glider:v0.15.0 -verbose -listen :8123
Unable to find image 'nadoo/glider:v0.15.0' locally
v0.15.0: Pulling from nadoo/glider
29291e31a76a: Pull complete 
7ae280a0a2e7: Pull complete 
8a71bc7c5d51: Pull complete 
Digest: sha256:5f4da128a7064dd49af7b2c033c850a4d13636aa954fdd00421a08adc7d6dd95
Status: Downloaded newer image for nadoo/glider:v0.15.0
2021/11/12 11:20:15 group.go:186: [group] only 1 forwarder found, disable health checking
2021/11/12 11:20:15 server.go:107: [socks5] listening UDP on :8123
2021/11/12 11:20:15 mixed.go:68: [mixed] listening TCP on :8123
2021/11/12 11:21:53 server.go:87: [socks5] 192.168.0.198:49676 <-> 34.195.99.244:443 via DIRECT
^C
cat /mnt/boot/system-proxy/redsocks.conf 
base {
log_debug = off;
log_info = on;
log = stderr;
daemon = off;
redirector = iptables;
}
redsocks {
type = socks5;
ip = 192.168.0.110;
port = 8123;
local_ip = 127.0.0.1;
local_port = 12345;
}

I also booted the board with this config and the glider container stopped, and it showed up online.

Tested with meta-balena v2.85.2.

floion commented 3 years ago

I just tested with OS 2021.10.2 on a raspberrypi3 device type and the bug is reproducible. As Alex also observed, when closing the glider container the device didn't even go offline.

klutchell commented 3 years ago

I just tested balenaOS 2.86.0+rev1 on raspberrypi3 and could not reproduce. The device would not come online unless glider was running. Stopping glider disconnected the device immediately.

I will try some other things to reproduce...

klutchell commented 3 years ago

@floion @acostach is the redsocks service still running when glider is disconnected? Can you collect redsocks journal logs? My logs are full of Connection refused during the windows where glider was not running.

acostach commented 3 years ago

@klutchell I took these debug logs with the last production image of Pi4, in production environment, they most likely match what @floion saw since it's the same behavior:

root@6962f75:~# journalctl | grep -i redsocks
Nov 16 11:58:35 6962f75 sh[1373]: iptables v1.8.4 (legacy): Couldn't load target `REDSOCKS':No such file or directory
Nov 16 11:58:35 6962f75 sh[1374]: iptables v1.8.4 (legacy): Couldn't load target `REDSOCKS':No such file or directory
Nov 16 11:58:35 6962f75 sh[1376]: iptables v1.8.4 (legacy): Couldn't load target `REDSOCKS':No such file or directory
Nov 16 11:58:36 6962f75 avahi-daemon[1352]: Joining mDNS multicast group on interface resin-redsocks.IPv4 with address 10.114.103.1.
Nov 16 11:58:36 6962f75 avahi-daemon[1352]: New relevant interface resin-redsocks.IPv4 for mDNS.
Nov 16 11:58:36 6962f75 avahi-daemon[1352]: Registering new address record for 10.114.103.1 on resin-redsocks.IPv4.
Nov 16 11:58:36 6962f75 NetworkManager[1324]: <info>  [1637063916.2960] manager: (resin-redsocks): new Bridge device (/org/freedesktop/NetworkManager/Devices/5)
Nov 16 11:58:36 6962f75 redsocks[1424]: 1637063916.469490 warning redsocks.c:177 redsocks_onexit(...) Overwriting `redsocks:local_ip` entry: it needs to match OS' iptables configuration.
Nov 16 11:58:36 6962f75 redsocks[1424]: 1637063916.469789 notice main.c:167 main(...) redsocks started, conn_max=128
Nov 16 11:58:45 6962f75 redsocks[1424]: 1637063925.540162 info redsocks.c:1254 redsocks_accept_client(...) [192.168.0.130:61446->35.169.89.252:443]: accepted
Nov 16 11:58:45 6962f75 redsocks[1424]: 1637063925.554331 info redsocks.c:1254 redsocks_accept_client(...) [192.168.0.130:56570->18.210.69.101:443]: accepted
Nov 16 11:58:45 6962f75 redsocks[1424]: 1637063925.629375 debug redsocks.c:352 redsocks_start_relay(...) [192.168.0.130:61446->35.169.89.252:443]: data relaying started
Nov 16 11:58:45 6962f75 redsocks[1424]: 1637063925.629733 debug redsocks.c:352 redsocks_start_relay(...) [192.168.0.130:56570->18.210.69.101:443]: data relaying started
Nov 16 11:59:09 6962f75 redsocks[1424]: 1637063949.496040 info redsocks.c:682 redsocks_drop_client(...) [192.168.0.130:56570->18.210.69.101:443]: connection closed
root@6962f75:~# cat /etc/os-release 
ID="balena-os"
NAME="balenaOS"
VERSION="2.85.2+rev3"
VERSION_ID="2.85.2+rev3"
PRETTY_NAME="balenaOS 2.85.2+rev3"
MACHINE="raspberrypi4-64"
META_BALENA_VERSION="2.85.2"
BALENA_BOARD_REV="ab7bc92"
META_BALENA_REV="34a41e07"
SLUG="raspberrypi4-64"

Kill glider container, leaving it closed:

docker run --rm --network host --user 995 nadoo/glider:v0.15.0 -verbose -listen :8123
2021/11/16 11:56:51 group.go:186: [group] only 1 forwarder found, disable health checking
2021/11/16 11:56:51 mixed.go:68: [mixed] listening TCP on :8123
2021/11/16 11:56:51 server.go:107: [socks5] listening UDP on :8123
2021/11/16 11:59:03 server.go:87: [socks5] 192.168.0.130:57110 <-> 18.210.69.101:443 via DIRECT
2021/11/16 11:59:03 server.go:87: [socks5] 192.168.0.130:57106 <-> 35.169.89.252:443 via DIRECT
^C

Device goes offline briefly, then comes back online:

root@6962f75:~# journalctl | grep -i redsocks
Nov 16 11:58:35 6962f75 sh[1373]: iptables v1.8.4 (legacy): Couldn't load target `REDSOCKS':No such file or directory
Nov 16 11:58:35 6962f75 sh[1374]: iptables v1.8.4 (legacy): Couldn't load target `REDSOCKS':No such file or directory
Nov 16 11:58:35 6962f75 sh[1376]: iptables v1.8.4 (legacy): Couldn't load target `REDSOCKS':No such file or directory
Nov 16 11:58:36 6962f75 avahi-daemon[1352]: Joining mDNS multicast group on interface resin-redsocks.IPv4 with address 10.114.103.1.
Nov 16 11:58:36 6962f75 avahi-daemon[1352]: New relevant interface resin-redsocks.IPv4 for mDNS.
Nov 16 11:58:36 6962f75 avahi-daemon[1352]: Registering new address record for 10.114.103.1 on resin-redsocks.IPv4.
Nov 16 11:58:36 6962f75 NetworkManager[1324]: <info>  [1637063916.2960] manager: (resin-redsocks): new Bridge device (/org/freedesktop/NetworkManager/Devices/5)
Nov 16 11:58:36 6962f75 redsocks[1424]: 1637063916.469490 warning redsocks.c:177 redsocks_onexit(...) Overwriting `redsocks:local_ip` entry: it needs to match OS' iptables configuration.
Nov 16 11:58:36 6962f75 redsocks[1424]: 1637063916.469789 notice main.c:167 main(...) redsocks started, conn_max=128
Nov 16 11:58:45 6962f75 redsocks[1424]: 1637063925.540162 info redsocks.c:1254 redsocks_accept_client(...) [192.168.0.130:61446->35.169.89.252:443]: accepted
Nov 16 11:58:45 6962f75 redsocks[1424]: 1637063925.554331 info redsocks.c:1254 redsocks_accept_client(...) [192.168.0.130:56570->18.210.69.101:443]: accepted
Nov 16 11:58:45 6962f75 redsocks[1424]: 1637063925.629375 debug redsocks.c:352 redsocks_start_relay(...) [192.168.0.130:61446->35.169.89.252:443]: data relaying started
Nov 16 11:58:45 6962f75 redsocks[1424]: 1637063925.629733 debug redsocks.c:352 redsocks_start_relay(...) [192.168.0.130:56570->18.210.69.101:443]: data relaying started
Nov 16 11:59:09 6962f75 redsocks[1424]: 1637063949.496040 info redsocks.c:682 redsocks_drop_client(...) [192.168.0.130:56570->18.210.69.101:443]: connection closed
Nov 16 11:59:56 6962f75 redsocks[1424]: 1637063996.111891 info redsocks.c:682 redsocks_drop_client(...) [192.168.0.130:61446->35.169.89.252:443]: connection closed
Nov 16 12:00:01 6962f75 redsocks[1424]: 1637064001.131752 info redsocks.c:1254 redsocks_accept_client(...) [192.168.0.130:61492->35.169.89.252:443]: accepted
Nov 16 12:00:01 6962f75 redsocks[1424]: 1637064001.211803 notice utils.c:269 red_is_socket_connected_ok(...) connect: Connection refused
Nov 16 12:00:01 6962f75 redsocks[1424]: 1637064001.211905 notice redsocks.c:969 redsocks_relay_connected(...) [192.168.0.130:61492->35.169.89.252:443]: red_is_socket_connected_ok: Connection refused
Nov 16 12:00:01 6962f75 redsocks[1424]: 1637064001.211962 info redsocks.c:697 redsocks_drop_client(...) [192.168.0.130:61492->35.169.89.252:443]: dropping client (R/W), relay (R/W), idle 0.000182s
Nov 16 12:00:06 6962f75 redsocks[1424]: 1637064006.232164 info redsocks.c:1254 redsocks_accept_client(...) [192.168.0.130:54812->35.169.76.143:443]: accepted
Nov 16 12:00:06 6962f75 redsocks[1424]: 1637064006.237038 notice utils.c:269 red_is_socket_connected_ok(...) connect: Connection refused
Nov 16 12:00:06 6962f75 redsocks[1424]: 1637064006.237117 notice redsocks.c:969 redsocks_relay_connected(...) [192.168.0.130:54812->35.169.76.143:443]: red_is_socket_connected_ok: Connection refused
Nov 16 12:00:06 6962f75 redsocks[1424]: 1637064006.237168 info redsocks.c:697 redsocks_drop_client(...) [192.168.0.130:54812->35.169.76.143:443]: dropping client (R/W), relay (R/W), idle 0.000144s
Nov 16 12:00:11 6962f75 redsocks[1424]: 1637064011.257491 info redsocks.c:1254 redsocks_accept_client(...) [192.168.0.130:59492->3.227.28.93:443]: accepted
Nov 16 12:00:11 6962f75 redsocks[1424]: 1637064011.261982 notice utils.c:269 red_is_socket_connected_ok(...) connect: Connection refused
Nov 16 12:00:11 6962f75 redsocks[1424]: 1637064011.262061 notice redsocks.c:969 redsocks_relay_connected(...) [192.168.0.130:59492->3.227.28.93:443]: red_is_socket_connected_ok: Connection refused
Nov 16 12:00:11 6962f75 redsocks[1424]: 1637064011.262112 info redsocks.c:697 redsocks_drop_client(...) [192.168.0.130:59492->3.227.28.93:443]: dropping client (R/W), relay (R/W), idle 0.000146s

root@6962f75:~# systemctl status redsocks 
● redsocks.service - redsocks transparent proxy redirector
     Loaded: loaded (/lib/systemd/system/redsocks.service; enabled; vendor preset: enabled)
     Active: active (running) since Tue 2021-11-16 11:58:36 UTC; 4min 18s ago
   Main PID: 1424 (redsocks)
      Tasks: 1 (limit: 822)
     Memory: 552.0K
     CGroup: /system.slice/redsocks.service
             └─1424 /usr/bin/redsocks -c /mnt/boot/system-proxy/redsocks.conf

Nov 16 12:00:01 6962f75 redsocks[1424]: 1637064001.211905 notice redsocks.c:969 redsocks_relay_connected(...) [192.168.0.130:61492->35.169.89.252:443]: red_is_socket_connected_ok: Connection refused
Nov 16 12:00:01 6962f75 redsocks[1424]: 1637064001.211962 info redsocks.c:697 redsocks_drop_client(...) [192.168.0.130:61492->35.169.89.252:443]: dropping client (R/W), relay (R/W), idle 0.000182s
Nov 16 12:00:06 6962f75 redsocks[1424]: 1637064006.232164 info redsocks.c:1254 redsocks_accept_client(...) [192.168.0.130:54812->35.169.76.143:443]: accepted
Nov 16 12:00:06 6962f75 redsocks[1424]: 1637064006.237038 notice utils.c:269 red_is_socket_connected_ok(...) connect: Connection refused
Nov 16 12:00:06 6962f75 redsocks[1424]: 1637064006.237117 notice redsocks.c:969 redsocks_relay_connected(...) [192.168.0.130:54812->35.169.76.143:443]: red_is_socket_connected_ok: Connection refused
Nov 16 12:00:06 6962f75 redsocks[1424]: 1637064006.237168 info redsocks.c:697 redsocks_drop_client(...) [192.168.0.130:54812->35.169.76.143:443]: dropping client (R/W), relay (R/W), idle 0.000144s
Nov 16 12:00:11 6962f75 redsocks[1424]: 1637064011.257491 info redsocks.c:1254 redsocks_accept_client(...) [192.168.0.130:59492->3.227.28.93:443]: accepted
Nov 16 12:00:11 6962f75 redsocks[1424]: 1637064011.261982 notice utils.c:269 red_is_socket_connected_ok(...) connect: Connection refused
Nov 16 12:00:11 6962f75 redsocks[1424]: 1637064011.262061 notice redsocks.c:969 redsocks_relay_connected(...) [192.168.0.130:59492->3.227.28.93:443]: red_is_socket_connected_ok: Connection refused
Nov 16 12:00:11 6962f75 redsocks[1424]: 1637064011.262112 info redsocks.c:697 redsocks_drop_client(...) [192.168.0.130:59492->3.227.28.93:443]: dropping client (R/W), relay (R/W), idle 0.000146s
klutchell commented 2 years ago

@acostach @floion I tried again today, and while glider was closed my device was offline. My logs look the same as yours, redsocks is still running but the connection is refused over and over.

How are you determining that the device is "back online" after a few minutes? I'm using the balena CLI and the balena dashboard to verify that it is no longer able to reach the API when glider is offline. However, existing SSH sessions seem to stay alive. Note that ping tests always ignore the proxy as well.

acostach commented 2 years ago

@klutchell we're using the dashboard and are accessing the device trough the webterminal. Commands can be run trough the web termina, status is set to online..

klutchell commented 2 years ago

@acostach, @floion your device is accessing the internet over IPv6 and I'm pretty sure redsocks is not configured to proxy IPv6 traffic. I tested IPv4 traffic and it seems to be failing. I'll open a separate issue for supporting the proxy of IPv6 traffic.

For now we need to adjust the manual proxy tests to curl an IPv4 only endpoint from a shell on the device.

See https://github.com/balena-io-modules/device-diagnostics/pull/294/files

curl -qs "https://ipv4.google.com" >/dev/null || echo "failed"