balena-os / balena-supervisor

Balena Supervisor: balena's agent on devices.
https://balena.io
Other
148 stars 63 forks source link

Device state not reported; ip_address too long #1906

Closed kb2ma closed 2 years ago

kb2ma commented 2 years ago

Running Supervisor version 12.11.42 on balenaOS 2.89.15 on an Intel NUC. The following messages show in the supervisor logs:

Mar 20 11:16:09 04166f8 balena-supervisor[1821]: [info]    Reported current state to the cloud
Mar 20 11:16:28 04166f8 balena-supervisor[1821]: [error]   Device state report failure! Status code: 400 - message: "\"ip_address\" longer than 255 characters (267)"
Mar 20 11:16:28 04166f8 balena-supervisor[1821]: [info]    Retrying current state report in 15 seconds
Mar 20 11:16:44 04166f8 balena-supervisor[1821]: [error]   Device state report failure! Status code: 400 - message: "\"ip_address\" longer than 255 characters (267)"

The messages continue on, with the time interval doubling to 480 seconds, and then 900 seconds.

The balena device has two Internet interfaces, ethernet and WiFi. ifconfig for these interfaces looks like this:

eno1      Link encap:Ethernet  HWaddr 1C:69:7A:6E:6E:40  
          inet addr:192.168.1.103  Bcast:192.168.1.255  Mask:255.255.255.0
          inet6 addr: fe80::bcc7:cd23:a208:5ff1/64 Scope:Link
          inet6 addr: xxxx:xxxx:xxxx:xxxx::109/128 Scope:Global
          inet6 addr: xxxx:xxxx:xxxx:xxxx:e85a:8046:6d9d:ef41/64 Scope:Global
          inet6 addr: fd25:36da:e8ec::109/128 Scope:Global
          inet6 addr: fd25:36da:e8ec:0:3e6f:7653:ae69:9c36/64 Scope:Global
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:4611 errors:0 dropped:0 overruns:0 frame:0
          TX packets:4745 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:571584 (558.1 KiB)  TX bytes:930190 (908.3 KiB)
          Interrupt:16 Memory:a8b00000-a8b20000 

wlp0s20f3 Link encap:Ethernet  HWaddr 54:8D:5A:65:1B:52  
          inet addr:192.168.1.217  Bcast:192.168.1.255  Mask:255.255.255.0
          inet6 addr: fd25:36da:e8ec:0:6c57:ceac:3fae:4b27/64 Scope:Global
          inet6 addr: xxxx:xxxx:xxxx:xxxx::4f4/128 Scope:Global
          inet6 addr: xxxx:xxxx:xxxx:xxxx:6829:7689:63e6:b0e3/64 Scope:Global
          inet6 addr: fd25:36da:e8ec::4f4/128 Scope:Global
          inet6 addr: fe80::c229:cf2c:6cd8:cb0d/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:189 errors:0 dropped:0 overruns:0 frame:0
          TX packets:294 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:57744 (56.3 KiB)  TX bytes:57074 (55.7 KiB)

I was able to work around the problem by taking down the WiFi interface. Notice that now balenaCloud shows the following information.

error-ip-address-device

In both the graphic and the interface listing, I have obfuscated the network portion of the globally routable addresses. The fd25... addresses are system generated unique local addresses. Perhaps the root cause is that the Supervisor is trying to send all of these addresses in the device state report. This guess is consistent with working around the problem by taking down the WiFi interface.

pipex commented 2 years ago

Is this on your device @kb2ma? Could run the following command and paste the results?

balena exec -ti balena_supervisor node -e "os = require('os'); console.log(os.networkInterfaces())"

The supervisor reports all addresses from interfaces not in special "balena interfaces", that are not marked as internal and have a scopeid of 0 (global). From the address list you shared it looks like the interfaces are reported as Global, so I would like to see how Node reports them.

If there is a better filter we can add I think we should include it.

Still, I would also address this on the backend as 255 chars might not be enough for all cases which means even with the right filters we could run into this type of issue from time to time.

kb2ma commented 2 years ago

@pipex, yes this is my device. Below is the command result. Once again I have obfuscated the network portion of the globally routable IPv6 addresses.

root@04166f8:~# balena exec -ti balena_supervisor node -e "os = require('os'); console.log(os.networkInterfaces())"
{
  lo: [
    {
      address: '127.0.0.1',
      netmask: '255.0.0.0',
      family: 'IPv4',
      mac: '00:00:00:00:00:00',
      internal: true,
      cidr: '127.0.0.1/8'
    },
    {
      address: '::1',
      netmask: 'ffff:ffff:ffff:ffff:ffff:ffff:ffff:ffff',
      family: 'IPv6',
      mac: '00:00:00:00:00:00',
      internal: true,
      cidr: '::1/128',
      scopeid: 0
    }
  ],
  eno1: [
    {
      address: '192.168.1.103',
      netmask: '255.255.255.0',
      family: 'IPv4',
      mac: '1c:69:7a:6e:6e:40',
      internal: false,
      cidr: '192.168.1.103/24'
    },
    {
      address: 'xxxx:xxxx:xxxx:xxxx::3d3',
      netmask: 'ffff:ffff:ffff:ffff:ffff:ffff:ffff:ffff',
      family: 'IPv6',
      mac: '1c:69:7a:6e:6e:40',
      internal: false,
      cidr: 'xxxx:xxxx:xxxx:xxxx::3d3/128',
      scopeid: 0
    },
    {
      address: 'fd25:36da:e8ec::3d3',
      netmask: 'ffff:ffff:ffff:ffff:ffff:ffff:ffff:ffff',
      family: 'IPv6',
      mac: '1c:69:7a:6e:6e:40',
      internal: false,
      cidr: 'fd25:36da:e8ec::3d3/128',
      scopeid: 0
    },
    {
      address: 'fd25:36da:e8ec:0:3e6f:7653:ae69:9c36',
      netmask: 'ffff:ffff:ffff:ffff::',
      family: 'IPv6',
      mac: '1c:69:7a:6e:6e:40',
      internal: false,
      cidr: 'fd25:36da:e8ec:0:3e6f:7653:ae69:9c36/64',
      scopeid: 0
    },
    {
      address: 'xxxx:xxxx:xxxx:xxxx:e85a:8046:6d9d:ef41',
      netmask: 'ffff:ffff:ffff:ffff::',
      family: 'IPv6',
      mac: '1c:69:7a:6e:6e:40',
      internal: false,
      cidr: 'xxxx:xxxx:xxxx:xxxx:e85a:8046:6d9d:ef41/64',
      scopeid: 0
    },
    {
      address: 'fe80::bcc7:cd23:a208:5ff1',
      netmask: 'ffff:ffff:ffff:ffff::',
      family: 'IPv6',
      mac: '1c:69:7a:6e:6e:40',
      internal: false,
      cidr: 'fe80::bcc7:cd23:a208:5ff1/64',
      scopeid: 2
    }
  ],
  wlp0s20f3: [
    {
      address: '192.168.1.217',
      netmask: '255.255.255.0',
      family: 'IPv4',
      mac: '54:8d:5a:65:1b:52',
      internal: false,
      cidr: '192.168.1.217/24'
    },
    {
      address: 'xxxx:xxxx:xxxx:xxxx::5f5',
      netmask: 'ffff:ffff:ffff:ffff:ffff:ffff:ffff:ffff',
      family: 'IPv6',
      mac: '54:8d:5a:65:1b:52',
      internal: false,
      cidr: 'xxxx:xxxx:xxxx:xxxx::5f5/128',
      scopeid: 0
    },
    {
      address: 'fd25:36da:e8ec::5f5',
      netmask: 'ffff:ffff:ffff:ffff:ffff:ffff:ffff:ffff',
      family: 'IPv6',
      mac: '54:8d:5a:65:1b:52',
      internal: false,
      cidr: 'fd25:36da:e8ec::5f5/128',
      scopeid: 0
    },
    {
      address: 'fd25:36da:e8ec:0:6c57:ceac:3fae:4b27',
      netmask: 'ffff:ffff:ffff:ffff::',
      family: 'IPv6',
      mac: '54:8d:5a:65:1b:52',
      internal: false,
      cidr: 'fd25:36da:e8ec:0:6c57:ceac:3fae:4b27/64',
      scopeid: 0
    },
    {
      address: 'xxxx:xxxx:xxxx:xxxx:6829:7689:63e6:b0e3',
      netmask: 'ffff:ffff:ffff:ffff::',
      family: 'IPv6',
      mac: '54:8d:5a:65:1b:52',
      internal: false,
      cidr: 'xxxx:xxxx:xxxx:xxxx:6829:7689:63e6:b0e3/64',
      scopeid: 0
    },
    {
      address: 'fe80::c229:cf2c:6cd8:cb0d',
      netmask: 'ffff:ffff:ffff:ffff::',
      family: 'IPv6',
      mac: '54:8d:5a:65:1b:52',
      internal: false,
      cidr: 'fe80::c229:cf2c:6cd8:cb0d/64',
      scopeid: 4
    }
  ],
  'resin-vpn': [
    {
      address: '10.240.0.4',
      netmask: '255.255.255.255',
      family: 'IPv4',
      mac: '00:00:00:00:00:00',
      internal: false,
      cidr: '10.240.0.4/32'
    },
    {
      address: 'fe80::e354:cb61:6108:aa18',
      netmask: 'ffff:ffff:ffff:ffff::',
      family: 'IPv6',
      mac: '00:00:00:00:00:00',
      internal: false,
      cidr: 'fe80::e354:cb61:6108:aa18/64',
      scopeid: 8
    }
  ]
}
pipex commented 2 years ago

@kb2ma as I suspected, node reads the addresses you report as global addresses

    {
      address: 'fd25:36da:e8ec::3d3',
      netmask: 'ffff:ffff:ffff:ffff:ffff:ffff:ffff:ffff',
      family: 'IPv6',
      mac: '1c:69:7a:6e:6e:40',
      internal: false,
      cidr: 'fd25:36da:e8ec::3d3/128',
      scopeid: 0
    }

Which system is adding those addresses? Could this be a bug in that system? I don't really see another way to filter those addresses otherwise.

kb2ma commented 2 years ago

@pipex I don't think the list of addresses indicates a bug, but I can't precisely express the reason for each address. Here's what I think is fair to say.

The two fd25: addresses are unique local addresses (ULAs). They may be tagged as global, but generically fc00::/7 addresses are not globally routable. They are intended for the local network.

The 2601:...:/64 address is a globally routable address. It's behind a firewall, so is not directly accessible though.

The 2601:...:/128 and fd25...:/128 are really meant to be point to point links on the local network. I'm kind of fuzzy on this point though. It makes sense to me that they wouldn't be globally useful because the prefix is so long. My suspicion is that they are involved in DHCP, to remove the routing aspect. My router GUI shows the /128 addresses for hosts on the network in the DHCP section.

Fundamentally with IPv6 the address space is so large that addresses are used in ways that were not practical with IPv4. OTOH some IoT devices might only have a single IPv6 address. We need to better distinguish the value of different types of addresses.

Maybe there should be a couple of addresses displayed by default in the UI, and the rest of them accessible with more effort.

pipex commented 2 years ago

The two fd25: addresses are unique local addresses (ULAs). They may be tagged as global, but generically fc00::/7 addresses are not globally routable. They are intended for the local network.

Can you ping or reach your device through one of the fd25 addresses from another peer on the network @kb2ma ?

Fundamentally with IPv6 the address space is so large that addresses are used in ways that were not practical with IPv4. OTOH some IoT devices might only have a single IPv6 address. We need to better distinguish the value of different types of addresses.

Yes I agree with this point, however I don't know what criteria can we use for filtering. If we add something that seems arbitrary it might lead users to report missing ipv6 addresses as a bug. If there is a standard we can use I think that would be much better.

As it is defined right now, the only criteria is that the addresses are locally accessible, from that point of view, what you are reporting doesn't seem as a bug, or am I missing something?

kb2ma commented 2 years ago

My balena device and workstation are on the same LAN. I can ping the balena device using both fd25 addresses and both of the obfuscated addresses.

I agree that it's difficult to distinguish which addresses are most important. It would be very helpful to speak with users who administer IPv6 networks. @ab77, any opinions?

The bug is that when both the Ethernet and WiFi interfaces are up, there is a failure when sending the device state report to balenaCloud -- "\"ip_address\" longer than 255 characters (267). I suspect that is related to the length of the text representation of the 4 IPv6 addresses + 1 IPv4 address per interface. I will try debugging with the Supervisor to verify that.

kb2ma commented 2 years ago

I will try debugging with the Supervisor to verify that.

Added a little debug logging to the latest Supervisor code. The space separated list of those 4 IPv6 + 1 IPv4 addresses for both the Ethernet and WiFi interfaces (i.e. 10 addresses in all) is indeed 267 chars long.

ab77 commented 2 years ago

@kb2ma @pipex IPv6 ULA (Unique Local Address) RFC4193 are to IPv6 what RFC 1918 are to IPv4. So they should be handled the same way by the supervisor (as private?).

PranavPeshwe commented 2 years ago

A similar limitation applies to the mac_address field as well. In devices with a large number of virtual interfaces, all the mac addresses together make up to more than 255 chars. This results in an error identical to the one Ken has reported earlier. (access restricted) FD thread here: https://www.flowdock.com/app/rulemotion/r-supervisor/threads/PJY3Ia5yBHYQbeaoVIRwO37qMb_ (access restricted) JF ticket here: https://jel.ly.fish/support-thread-1-0-0-front-cnv-cavkm31

jellyfish-bot commented 2 years ago

[cywang117] This has attached https://jel.ly.fish/83958423-d3b7-4926-bf2a-2f1e970d1c7a

cywang117 commented 2 years ago

Closing in favor of https://github.com/balena-io/open-balena-api/issues/1059

cywang117 commented 2 years ago

To be fixed by https://github.com/balena-io/open-balena-api/pull/1066